OhneHose

OhneHose@feddit.org · 6 days ago

I’d not use ollama, it’s basically just a fancy wrapper around lama.cpp.

There’s also modules/docker containers to hot swap models with lama.cpp

My model hosting setup is: Lama.cpp -> Open web UI

Lama.cpp is running in a local shell on my Mac Mini, since setting up GPU support with metal is (or was?) a pain. And open web UI sits in a docker with a local storage mounted so it have persistence when updating or moving the docker.

16gigs vram however ain’t too much, you’ll be fairly limited to fairly low quants. It will be reasonably fast tho. If you can use most of your system ram you could go and host f.e. qwen 3.6 bf8(~56gb) or bf4 (~30gb). It would be slower but you also gain a lot of usability from that.

Or you host two models a smaller one on the GPU and bigger one with system ram so you can switch between “knowledge” and speed.

Using lama.cpp you’ll have to take a look at huggingface & use gguf models.

OhneHose@feddit.org · 8 days ago

They are influencing elections since 2015/2016. There’s really no surprise here.

Cambridge analytica even made a talk in how they influenced the first trump election.

OhneHose@feddit.org · 9 days ago

Because then you get no advertising moneyzzzz