Recently made a post about the 35b MOE. Now the dense 27b variant has been released.


  • SuspciousCarrot78@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    2 days ago

    Rules of thumb

    • For a 27B: if you want it to run entirely on your GPU, you will need to use a quantisation that fits + leave room for KV cache. So (for example), if your model GGUF was 10GB, I’d leave another 2GB for kv cache, meaning you’d need 12gb to run it with a reasonable context length. I haven’t looked at the quants for Qwen3.6 27B yet…I imagine the “good baseline” quant is what…12? 15gb?

    Having said that, remember that 1) you can run partially on CPU/GPU 2) use lower quants etc. So, if you have “just” 12GB, a lower quant (I dunno…IQ3_XS?) might get you over the line

    • You can run it however you want :) For someone brand new, the best all in one is Ollama or Jan.ai.

    • Yes. Jan.ai has MCP tooling (I imagine ollama does as well), so you can follow the how-to’s to set that up. Read their docs? What do you need to do with MCP?

    • What you should know: you’ll reach a point where “more parameters = better performance” needs to be balanced against cost and smarter tooling. Don’t be tempted to drop $$$ on something thinking you can just throw money at the problem to make it go away.

    • venusaur@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 days ago

      Thanks! I’m experimenting with my laptop with 16GB RAM and no GPU/VRAM. I installed llama.cpp and am testing Gemma 7b Q5 but it’s not answering prompts correctly. It’s analyzing the prompt and not answering the question, or it gives me a poem haha. Trying to figure it out.

      Any lightweight model you recommend for just chat experimenting for now? Can they connect to the internet?

      • SuspciousCarrot78@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        2 days ago

        I’ll never not recommend Qwen3-4B 2507 instruct…because despite being ancient in AI terms (so, 8 months lol) it’s solid. Notably, the base models in Jan are all Qwen 3-4 variants.

        Most models can search the web, if they have access to web searching tool.