Hey guys,

What’s currently the best LLM for low-VRAM machines with only 6 GB VRAM? I’ve got 32GB RAM as well.

I’m experimenting a little with SillyTavern and I’m curious which model gets the most out of my setup. Should be multilingual and suitable for “casual chatting”.

I know I will probably not get very far with this, but I’m still interested in how far we’ve already come.

(Using KoboldCPP if that matters).

~sp3ctre

  • Multiplexer@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    3
    ·
    7 days ago

    I have a Qwen3.6-35b-a3b model running on a dated desktop machine with 4GB VRAM.
    I use 8-bit-quant, but also have 48GB normal RAM.
    Delivers ~7tk/s, which is already totally usable for most things.
    Tried it on my recent Core-i7 company laptop with 8GB VRAM and got 20tk/s.
    Oh, and I am also using KoboldCPP (on a Linux foundation).