What's your self-hosting success of the week?

shark@lemmy.org · 29 days ago

What's your self-hosting success of the week?

Shimitar@downonthestreet.eu · 29 days ago

I plugged in an NVIDIA gpu in my server and enabled ollama to use it, diligently updated my public wiki about it and now enjoying real time gpt: OSS model responses!

I was amazed, time cut from 3-8 minutes down to seconds. I have a Intel Core7 with 48gb ram, but even an oldish gpu beats the crap out of it.

sharkaccident@lemmy.world · 29 days ago

What GPU and model you use?

Shimitar@downonthestreet.eu · 29 days ago

NVIDIA Corporation GA104GL [RTX A4000] (rev a1)

From lspci

It has 16gb of VRAM, not too much but enough to run gpt:OSS 20b and a few other models pretty nice.

I noticed that it’s better to stick to a single model, I imagine that unload and reload the model in VRAM takes time.