• 9 Posts
  • 4 Comments
Joined 3 years ago
cake
Cake day: November 25th, 2022

help-circle

  • A lot has been said, but to add to the list I’d say it gives them access to quite a large pool of free testers.

    LLM architectures and optimization techniques change rapidly and by releasing open-weight models a lot of enthusiasts will evaluate new models for free, help implement support in inference engines, catch bugs etc. (and in turn, ofc, get a new model to run for free, so it’s at least somewhat symbiotic).

    We have at least seen this quite obviously when Alibaba released Qwen3-Next, which was a somewhat undertrained but still useful model which introduced the architecture that their latest models now use “in production” (also their paid “Max” models).




  • I don’t follow the discussions on this topic very closely, but as I understood, there are different ways to achieve the goal, but all impact quality to some extent. Heretic is discussed as one one of the SOTA methods. The README posted above states the following, so it seems that heretic is some sort of next gen abliteration.

    It combines an advanced implementation of directional ablation, also known as “abliteration” (Arditi et al. 2024, Lai 2025 (1, 2)), with a TPE-based parameter optimizer powered by Optuna.