• theunknownmuncher@lemmy.world
      cake
      link
      fedilink
      arrow-up
      1
      ·
      10 days ago

      I run 27b at q8 with unquantized KV cache and 256k context on two Instinct MI60 GPUs. Definitely the best model that I have been able to run locally at a reasonable speed. 35b generates tokens as fast as you’d expect from any cloud provider. 27b is slower than 35b, of course, but token generation is still faster than my reading speed and suitable with coding agents.