llama.cpp: don't sleep on --split-mode tensor

robber@lemmy.ml · edit-2 21 days ago

llama.cpp: don't sleep on --split-mode tensor

robber@lemmy.ml · 21 days ago

A lot has been said, but to add to the list I’d say it gives them access to quite a large pool of free testers.

LLM architectures and optimization techniques change rapidly and by releasing open-weight models a lot of enthusiasts will evaluate new models for free, help implement support in inference engines, catch bugs etc. (and in turn, ofc, get a new model to run for free, so it’s at least somewhat symbiotic).

We have at least seen this quite obviously when Alibaba released Qwen3-Next, which was a somewhat undertrained but still useful model which introduced the architecture that their latest models now use “in production” (also their paid “Max” models).

robber@lemmy.ml · 2 months ago

Gemma 4 is here

robber@lemmy.ml · 2 months ago

Global sustainability rules???

robber@lemmy.ml · 3 months ago

I don’t follow the discussions on this topic very closely, but as I understood, there are different ways to achieve the goal, but all impact quality to some extent. Heretic is discussed as one one of the SOTA methods. The README posted above states the following, so it seems that heretic is some sort of next gen abliteration.

It combines an advanced implementation of directional ablation, also known as “abliteration” (Arditi et al. 2024, Lai 2025 (1, 2)), with a TPE-based parameter optimizer powered by Optuna.

robber@lemmy.ml · 3 months ago

Smaller qwen3.5 models released

robber@lemmy.ml · edit-2 3 months ago

Yeah I enjoy it as well. Just in case you missed it - a fix was merged into llama.cpp two days ago which is said to improve quality.

Edit: I stand corrected - the fix for the issue you’re experiencing has not yet been merged.

robber@lemmy.ml · 3 months ago

Qwen3-Coder-Next

robber@lemmy.ml · 7 months ago

Relevance of GPU driver version for inference performance

robber@lemmy.ml · 8 months ago

Magistral-Small-2509 by Mistral has been released

robber@lemmy.ml · edit-2 8 months ago

Qwen3-Next with 80b-a3b parameters is out

robber@lemmy.ml · 11 months ago

Do you quantize models yourself?

robber@lemmy.ml · 2 years ago

Migrated my self-hosted Nextcloud to AIO and I absolutely love it