ejs

ejs@piefed.social · 8 days ago

lol they already support running local models. wtf is the distro gonna do…? pre-install llama.cpp? this is so silly to me that people are resigning over this, too.

ejs@piefed.social · 19 days ago

Honestly it heavily depends on the use case, in terms of making the model better and choosing between RAG/FT. The most important thing to consider is what sort of changes you want to make to the model. FT is still a good choice if you’re looking for: strict output formatting (json/yaml/…) and refining for highly specific, narrow domain tasks. RAG is better for knowledge freshness, having source citations, and greatly lowers hallucinations.

RAG will inflate your context windows (more tokens) at inference time, so slower responses and requiring more energy at compute, whereas fine-tuning takes a ton of gpu compute up front (but retains smaller token counts at inference). If you’re doing 100,000 prompts a day, and only need to train once, FT makes more sense; if you’re doing 100 prompts a day and your knowledge database is constantly changing, RAG makes the most sense.

It’s hard to give a formalized estimate on energy efficiency: fine-tuning and getting to a certain training accuracy can take some undeterminate amount of time (and money on rented GPU compute), but could be a better choice if you think that up-front cost will be paid off over time if you use the model very frequently and only fine-tune once. On the other hand, going the RAG route will have an absolutely free up front compute (energy) cost, but be slightly more at compute time due to more tokens.

What’s your specific task you’re considering for FT or no FT? This is the most important thing to choose.

ejs@piefed.social · 20 days ago

I do AI research for school. I’m specifically interested in safety alignment. I have studied the original papers for different fine tuning methods: LoRA is typically the baseline and there exist many variants, notably Q-LoRA

In general, fine tuning is not practically beneficial for hobby level foundation models. It in fact comes with many disadvantages. Primarily, it is difficult to maintain the intelligence of the model and avoid overfitting.

If you are trying to adapt a model to a specific task, you are generally going to find more success with using RAG and just adding more context to the model that way. Don’t waste time and compute $$ on training.

ejs@piefed.social · 20 days ago

Has anyone compiled a list of where projects are moving to? I know many linux desktop applications are self hosting on gitlab, but i’ve also seen gitea and codeberg. If anyone has opinions about a preference, do comment. I have been enjoying self hosting gitea for my simple personal projects and for deploying simple web apps, all on $5 vps.

ejs@piefed.social · 30 days ago

I only buy heavyweight denim, when shopping go for at least 15-16 oz fabric. The thin stuff is great for warm weather but is not gonna last

ejs@piefed.social · 1 month ago

I suggest looking at llm arena leaderboards filtered by open weight models. It offers benchmarks at a very complete and statistically detailed level for models, and usually is quite up to date when new models come out. The new Gemma that just came out might be the best for 1x GPU, and if you have a bunch of vram check out the larger Chinese models