Consumer hardware is no longer a priority for manufacturers

Lee Duna@lemmy.nz · 5 days ago

Consumer hardware is no longer a priority for manufacturers

floquant@lemmy.dbzer0.com · 3 days ago

Just a reminder that “consumer” means human. They’re fucking over everyone in favour of “corporations” (aka a few select humans)

potoooooooo ✅️@lemmy.world · 3 days ago

Human…capital?

Mrkawfee@lemmy.world · 2 days ago

In early 2022, consumer GPUs [Nvidia] accounted for 47% of Team Green’s total revenue; by early 2026, that share had fallen to just 7.5%. Over the same period, data center revenue surged to $51.2 billion, representing roughly 90% of the company’s earnings.

Wow, that’s a complete wipeout of GPUs for home computing.

I wonder if the diminishing returns in gaming graphics has something to do with it as well.

neclimdul@lemmy.world · 4 days ago

Kind of makes sense really when you think about it. The vast majority of consumers have had all their wealth eroded over decades to the point no one can buy anything. Better to let the AIs buy everything now.

varjen@lemmy.world · 4 days ago

I see a future where all computing is done in the cloud and home computers are just dumb terminals. An incredibly depressing future. Customers not users is the goal.

Atropos@lemmy.world · 3 days ago

They can pry my hardware from my cold, dead hands.

Also my car with knobs. I will continue to maintain that car for as long as possible.

varjen@lemmy.world · 3 days ago

Cars without knobs shouldn’t be allowed. Knobs are awesome.

Buelldozer@lemmy.today · 4 days ago

This has been predicted and worked towards since the 90s.

lavander@lemmy.dbzer0.com · 4 days ago

ChromeOS was exactly mean for that

dhork@lemmy.world · 4 days ago

This is yet another thing I blame on American Business sacrificing itself on the altar of Shareholder Value. It’s no longer acceptable for a public business to simply make a profit. It has to grow that profit, every quarter, without fail.

So, simply having a good consumer product division that makes money won’t be enough. At some point some executive will decide that he can’t possibly get his bonus if that’s all they do, and decide they need to blow it all up to chase larger profits elsewhere.

Maybe we need a small, private company to come along and start making good consumer hardware. They still need components, though, so will have to navigate getting that from public companies who won’t return their calls. And even once they are successful, the first thing they will do is cash out and go public, and the cycle starts again.

UnspecificGravity@piefed.social · 4 days ago

I’m looking forward to cheap Chinese video cards that out perform Nvidia shit for 1/4 the price.

Lfrith@lemmy.ca · 4 days ago

Hopefully Linux supported. That’s the main selling point of AMD GPUs right now for me, since there’s less problems with trying to get stuff like HDR running on it than NVIDIA.

I wonder why China is still for the most part ignoring Linux in favor of Windows. Like to update 8bitdo controllers they only provide a Microsoft program and no Linux version.

You’d think they’d be rushing towards pushing Linux adoption.

Joe@discuss.tchncs.de · 4 days ago

That’s capitalism for you. But also Linux, where it’s typical to upstream hardware support and rely on existing ecosystems rather than release addon drivers or niche supporting apps.

China has made some strategic investments in Linux over the years though – often domestically targeted, like Red Flag Linux, and drivers for chinese hardware, etc.

Lfrith@lemmy.ca · edit-2 4 days ago

Linux seems to be common to run things like servers, but is that the case for consumer hardware?

When I’ve looked at peripherals like keyboards and controllers linux support has been lacking. Of course, for keyboards I went out of my way to get qmk compatible ones to use via and vial instead so I dont need to run an exe of unknown origins to remap or update the firmware.

And how is it for games. Is there more of a push to support Linux for their games? Since like Genshin Impact they only officially support Windows. There’s work around with anime launcher which disables the DRM, but I wouldn’t consider that Linux support with it risking a ban. They have their own version of Finals now and Ive wondered if that has Linux support or at least have the DRM work with wine type methods instead of the approach Valorant took.

Joe@discuss.tchncs.de · 4 days ago

There is plenty of consumer hardware that is supported on Linux, or will be as soon as a kernel developer gets their hands on it, reverse engineers the protocol if necessary, and adds support. For things like keyboards, there are often proprietary extensions (eg. for built-in displays, macros, etc.). It pays to check for Linux support before buying hardware though. Sometimes it’s not the kernel drivers, but supporting software (eg. Steam input) that might not support it.

First class vendor support for Linux is more common for niche/premium hardware designed in the west, than cheap chinese knockoffs that follow it. Long term customer support is not their strong suit.

Lfrith@lemmy.ca · 4 days ago

Linux has been good about getting hardware working.

My wonder was more what is their level of native program support for Linux. Like 8bitdo to update firmware and set up extra profiles requires the Windows program to set up, but as a simple controller it will work on Linux just no real way to do extra stuff unless you dual boot.

moopet@sh.itjust.works · 4 days ago

What do you mean lacking support for keyboards and controllers? Maybe for doing weird custom stuff like RGB, but for anything else they’re standard HIDs and will work with anything, no “support” needed. You can plug a USB keyboard and mouse into your phone and it’ll work if you want.

I’m currently playing Clair Obscur on linux through steam with a cheap fake xbox controller I got off ebay, and it works perfectly. I’m using an Nvidia card too, and I haven’t had to do any customisation or anything.

Easy anti-cheat won’t work, so Valorant/Fortnite, etc. are out of the question for now, but any games that don’t use that kind of malware are probably fine.

Lfrith@lemmy.ca · 4 days ago

I’m talking more about software and program level support. Not whether hardware itself is working on Linux, which Linux has been good at.

Like software to be able to update firmware on controllers, which doesn’t work on Linux. Controller itself will work. 8bitdo to update firmware and set extra profiles they only support windows.

So more about what their level of native Linux support is so consumers get the same level of extra features as Windows users.

moopet@sh.itjust.works · 4 days ago

Ah, makes sense. You’re right about firmware updaters, and I don’t know if I’d trust one running under Wine anyway tbh. Who knows what weird system calls they make assuming you’re running Windows 95 or whatever.

Lfrith@lemmy.ca · 4 days ago

Yeah, when I updated my controller I did it on another device I have windows dual booted on with nothing important on it just for that purpose.

ulterno@programming.dev · 4 days ago

But also Linux, where it’s typical to upstream hardware support and rely on existing ecosystems rather than release addon drivers or niche supporting apps.

Still possible though, right?
It does afterall support out of tree device drivers now.

Joe@discuss.tchncs.de · 4 days ago

Sure… but why would el cheapo hardware want/need to support proprietary drivers? Now, for premium hardware and software, they might still want vendor lock-in mechanisms… So unless you absolutely have to, you should avoid hardware on Linux that needs proprietary drivers.

ulterno@programming.dev · 4 days ago

So either way, it make it better to support Linux over MS Windows.

UnspecificGravity@piefed.social · 4 days ago

Most of the handheld consoles they sell run Linux of some variety. It’s just a question of what is marketable.

Captain Aggravated@sh.itjust.works · 4 days ago

China has no need for open source because they steal everything anyway.

Lfrith@lemmy.ca · 4 days ago

My hope for open source is that if something sketchy is pushed there might be a chance to catch as opposed to a proprietary approach where nobody has a chance of knowing what is going on.

Captain Aggravated@sh.itjust.works · 4 days ago

Likewise. Don’t expect it from China though.

Lfrith@lemmy.ca · 4 days ago

Yeah don’t have my hopes up. Without it I don’t plan to give their GPUs a shot, since they aren’t saviors either with their state sponsored attack on notepad++ as a recent example. Just a potential hardware supplier.

So despite how bad hardware supply might get for consumers there’s still a level of caution I have and would need some level of a trustless system in place.

Otherwise I’d just opt for old PC hardware like retro console players have been doing for decades.

IratePirate@feddit.org · 4 days ago

Great. Now they’re building infrastructure and industry atop a stolen Trojan Horse, which may still bite them the moment the little oange man tells Nadella to flip the switch.

Psythik@lemmy.world · 4 days ago

I hope you’re right because Intel and AMD still can’t compete with high end Nvidia cards, and that’s how we ended up with a $5000 5090.

muusemuuse@sh.itjust.works · 4 days ago

AMD can already beat nvidia at the price tiers most people actually buy at, and Intel is gaining ground way faster than anyone expected.

But outside of the GPU shakeup, I could give a shit about Intel. Let China kill us. We earned this.

drev@lemmy.dbzer0.com · 4 days ago

FIVE THOUSAND?!

Jesus nun-fucking Christ, what an absolute scam. I bought a 1070 for $220 in the first few months after release. Guess I’ll just have to hope it can run for another 10 years…

Psythik@lemmy.world · 1 day ago

Tell me about it. I also paid about the same for a 1070 back in 2016 and it lasted me all the way until 2022, when I finally decided to be a sucker and pony up the $1600 for a brand new 4090 at launch. Which as insane as that is, I’m glad I did because now 4090s go for about $3K used!

If you didn’t buy a new GPU by mid 2023, you’re pretty much stuck with what you have for the foreseeable future, given how insane prices are now with no signs of slowing down.

JohnEdwa@sopuli.xyz · 4 days ago

We also partly ended up with the 5k 5090 because it’s just the TITAN RTX of the 50xx generation - the absolute top of the line a card where you pay 200% extra for that last +10% performance.
nVidia just realized few generations back that naming those cards the xx90 gets a bunch of more people to buy them, because they always desperately need to have the shiniest newest xx90 cards, no matter the cost.

EndlessNightmare@reddthat.com · 4 days ago

Nvidia cards are more powerful, but even if others never catch up they could still be solidly “good enough” for gaming. I have a newer Nvidia card and my computer is feels so wildly overbuilt. The only thing I wish I had more of was SSD space, but that’s a different problem.

Unless you’re a professional competitive gamer, in which case this is actual work equipment, the difference in performance between medium-tier and upper-echelon is probably not worth it for the average consumer.

Ilixtze@lemmy.ml · 4 days ago

AMERICAN manufacturers, just waint until the Chinese industries swoop in to fill the gap. I seriously feel America just wants to kneecap itself.

foodandart@lemmy.zip · 4 days ago

Wants to kneecap itself?

Dude, the US is going full seppuku and we’re going to gut ourselves on the floor.

boogiebored@lemmy.world · 4 days ago

“banned for security concerns”

Ilixtze@lemmy.ml · 4 days ago

Not a problem for me; I’m not in America, I own a Huawei phone and a Huion Tablet.

errer@lemmy.world · 4 days ago

Hard to swoop in with massive tariffs. The few players that remain will just charge a lot more…it’ll become the rich lucky few who can afford their own hardware.

Mihies@programming.dev · 4 days ago

No such tariffs in the EU 🥹

emergencyfood@sh.itjust.works · 4 days ago

TJA!@sh.itjust.works · 4 days ago

The US is not the only place to sell to

Ilixtze@lemmy.ml · 4 days ago

The rest of the world will be fine.

brucethemoose@lemmy.world · 4 days ago

I mean, I’d kill for a Chinese GPU. But software lock-in for your Steam back catalog is strong.

Also, have you been watching all the Chinese GPU announcements? They’re all in on datacenter machine learning ASICs too.

Ilixtze@lemmy.ml · 4 days ago

There is already a lot of good Chinese DDR 5 memory on the market and it’s a matter of time before Chinese GPU’s and CPU’s proliferate. I remember people in the west global north were sceptic about the viability of Chinese electric cars ever existing just 5 years ago; Elon even laughed at the possibility. Tables turn fast when you have industrial capacity and central planning.

brucethemoose@lemmy.world · 4 days ago

Chinese electric cars were always going to take off. RAM is just a commodity; if you sell the most bits at the lowest price and sufficient speed, it works.

If you’re in edge machine learning, if you write your own software stacks for niche stuff, Chinese hardware will be killer.

But if you’re trying to run Steam games? Or CUDA projects? That’s a whole different story. It doesn’t matter how good the hardware is, they’re always going to be handicapped by software in “legacy” code. Not just for performance, but driver bugs/quirks.

Proton (and focusing everything on a good Vulkan driver) is not a bad path forward, but still. They’re working against decades of dev work targeting AMD/Nvidia/Intel, up and down the stack.

Ilixtze@lemmy.ml · 4 days ago

But i feel it’s not a matter of the Industry adapting into an entirely different ecosystem. As in, i don’t think that China will be taking over the computer industry. I feel it will be more of an issue of giving American companies and their anti-consumer practices something they haven’t had during their lifetimes: Real competition. I feel a lot of attitudes could change once they are in an ecosystem where they don’t have the luxury of monopolies and closed environments and i feel we are long overdue for having new players in this difficult field. It’s not about being a China shill either but in the end competition is good for the consumer. It’s concerning that all American tech industries are in bed with each other and also in bed with a government bent in global control and totalitarian surveillance. I don’t think Chinese manufacturers could be exempt from these dangers but at least it will give consumers the possibilities to pick their poison.

Also, GPU and graphics standards have changed in less than decades. We can still play old games in new software. AAA Developer models are clearly dying and new standards for Indie and AA development are emerging. Some of the hottest games this year could be defined as made by indie studios. So instead of hitting a wall i feel gaming in general could be moving into a new paradigm and i sure as hell wish that paradigm is not cloud computing.

I am not a dedicated gamer. I am from south America and I am playing Expedition 33 with full graphics on a 10 year old, entry range GPU on an old AMD CPU with 32 gigs of DDR 4 memory. And I’m having fun! And this rig works great for my job with a variety of open sourced and pirated software. I don’t need the latest and the greatest. I just need something that gives me results at an affordable price. Lets say that for the next 5 years that might be the new standard as the industry self corrects.

BackgrndNoize@lemmy.world · 4 days ago

Do you think they won’t just ban the Chinese products lol, this ain’t a democracy bud

Ilixtze@lemmy.ml · 4 days ago

I’m not American so I ain’t part of your non-democracy.

IratePirate@feddit.org · 4 days ago

America doesn’t. The Russian asset in the White House and its brainwashed minions do.

Matty Roses@lemmy.today · 4 days ago

You guys voted him in, twice, with the popular vote the second time. Don’t pretend you don’t own him.

IratePirate@feddit.org · 4 days ago

Not an American here, nor a defender of what people did or didn’t vote for.

What I was getting at was the massive media manipulation from outside and inside and the obvious foreign control the Dear Leader is under. With the deadly combination of of poor education, widespread media illiteracy and pervasive social media, our democracies have become even more remotely-controllable for whoever is willing to put the money in.

Matty Roses@lemmy.today · 4 days ago

and the obvious foreign control

Yeah, there’s no need for this disproven conspiracy theory. Trump is 100% American, which is why they voted him in again. He’s just the version too stupid to not say the things out loud.

IratePirate@feddit.org · 4 days ago

“Disproven conspiracy”? In light of the multitude of connections between Trump (or his cronies) and the Kremlin that show up in the most recent batch of the Epstein files, I guess we’re on the verge of making that a proven conspiracy.

Until then, just looking at Trump’s track record is enough to convince me Trump is 100% enacting orders from Moscow. Why else would he

withholding and threaten to cut Ukraine off from American military aid.
drive a wedge between the US and its former Western allies.
give D0GE unfettered access to the government’s most sensitive information (including nuclear ones), only to then have Russian IP addresses try to connect?
gut USAID, thus putting more pressure on Europe through increasing refugee streams
and the ist goes on…

Matty Roses@lemmy.today · 4 days ago

You misspelled Mossad there

chunes@lemmy.world · 4 days ago

The US is not a democracy. Just like in Russia, the votes are for show.

ɯᴉuoʇuɐ@lemmy.dbzer0.com · 4 days ago

Sorry but no. As profoundly unfair and undemocratic the US system is, it’s still more democratic than Russia where any serious opposition is literally murdered in broad daylight.

“The votes are for show” – do you mean to say that Trump’s victory in 2016, Biden’s in 2020 and Trump’s again in 2024 were prearranged by the central “powers that be”?

Ilixtze@lemmy.ml · 4 days ago

Sorry to break it to you bud; But America has a Plutocracy problem; It’s not a question of Putin running the show, but the American legal system being unable to persecute crimes and corruption if it happens to be billionaires. So in essence the system is compromised.

IratePirate@feddit.org · 4 days ago

I don’t disagree on the longstanding corruption problem. Yet I uphold the claim that this does not just expose the system to domestic, but also foreign corruption and influence.

dantheclamman@lemmy.world · 3 days ago

This will not stop until the ultra-rich are destroyed as a class. They have constructed a parallel economy, and we are all their serfs. History shows this situation can’t last and the question is whether they can be parted with their wealth peacefully or not

Jason2357@lemmy.ca · 3 days ago

Generally speaking, the consumer market has been entirely eclipsed by business to business sales. The only entities with expendable cash are businesses.

Fred@lemmy.world · 4 days ago

Imma remember what Curcial and others are doing, so when the AI bubble pops I’ll skip on all their products.

brucethemoose@lemmy.world · 4 days ago

Also, this has been the case (or at least planned) for a while.

Pascal (the GTX 1000 series) and Ampere (the RTX 3000 series) used the exact same architecture for datacenter/gaming. The big gaming dies were dual use and datacenter-optimized. This habit sort of goes back to ~2008, but Ampere and the A100 is really where “datacenter first” took off.

AMD announced a plan to unify their datacenter/gaming architecture awhile ago, and prioritized the MI300X before that. And EPYC has always been the priority, too.

Intel wanted to do this, but had some roadmap trouble.

These companies have always put datacenter first, it just took this much drama for the consumer segment to largely notice.

Bakkoda@sh.itjust.works · 4 days ago

Consumer sales are very very trackable. Off channel bulk sales can be very hard to verify and I’m sure that’s not being used to prop up valuation. Not at all.

anon_8675309@lemmy.world · 3 days ago

With the exception of a few tasks most modern hardware should last a while doing everyday tasks. If you’re due for an upgrade do it and then get off the consumerism train for a while. If you’ve got something within the past two or three years you’re set for a while.

Korkki@lemmy.ml · 4 days ago

The silver lining is that hardware performance gains have been so minor from generation to generation that upgrading isn’t really that important anymore. Like if i upgrade from next generation equivalent GPU it would give like 8% more fps… and it costs like 1,5k… No thanks.

cmnybo@discuss.tchncs.de · 4 days ago

You used to get a fairly significant upgrade ever few years for about the same cost as the old hardware. Transistors aren’t really getting much smaller anymore, so more performance needs a bigger die and costs more money.

foodandart@lemmy.zip · 4 days ago

Is Moore’s Law being resurrected?

Korkki@lemmy.ml · 4 days ago

Transistor size downscaling is pretty much done. Also mosfets can’t much improve in this race anymore. We would need a new computing paradigm to see manufacturing cost reductions or major performance leaps. For consumers thats still years away.

amorpheus@lemmy.world · 4 days ago

Probably also a big reason why it’s less profitable - consumers are upgrading more and more slowly. In part because of the performance gains being smaller, in part because a lot of components are getting more expensive. In that way it’s a self fulfilling prophecy.

melfie@lemy.lol · edit-2 4 days ago

I’ve been looking into self-hosting LLMs, and it seems a $10k GPU is kind of a requirement to run a decently-sized model and get reasonable tokens / s rate. There’s CPU and SSD offloading, but I’d imagine it would be frustratingly slow to use. I even find cloud-based AI like GH Copilot to be rather annoyingly slow. Even so, GH Copilot is like $20 a month per user, and I’d be curious what the actual costs are per user considering the hardware and electricity cost.

What we have now is clearly an experimental first generation of the tech, but the industry is building out data centers as though it’s always going to require massive GPUs / NPUs with wicked quantities of VRAM to run these things. If it really will require huge data centers full of expensive hardware where each user prompt requires minutes of compute time on a $10k GPU, then it can’t possibly be profitable to charge a nominal monthly fee to use this tech, but maybe there are optimizations I’m unaware of.

Even so, if the tech does evolve and it become a lot cheaper to host these things, then will all these new data centers still be needed? On the other hand, if the hardware requirements don’t decrease by an order of magnitude, then will it be cost effective to offer LLMs as a service, in which case, I don’t imagine the new data centers will be needed either.

brucethemoose@lemmy.world · edit-2 4 days ago

This is not true. I have a single 3090 + 128GB CPU RAM (which wasn’t so expensive that long ago), and I can run GLM 4.6 350B at 6 tokens/sec, with measurably reasonable quantization quality. I can run sparser models like Stepfun 3.5, GLM Air or Minimax 2.1 much faster, and these are all better than the cheapest API models. I can batch Kimi Linear, Seed-OSS, Qwen3, and all sorts of models without any offloading for tons of speed.

…It’s not trivial to set up though. It’s definitely not turnkey. That’s the issue.

You can’t just do “ollama run” and expect good performance, as the local LLM scene is finicky and highly experimental. You have to compile forks and PRs, learn about sampling and chat formatting, perplexity and KL divergence, about quantization and MoEs and benchmarking. Everything is moving too fast, and is too performance sensitive, to make it that easy, unfortunately.

EDIT:

And if I were trying to get local LLMs setup today, for a lot of usage, I’d probably buy an AI Max 395 motherboard instead of a GPU. They aren’t horrendously priced, and they don’t slurp power like a 3090. 96GB VRAM is the perfect size for all those ~250B MoEs.

But if you go AMD, take all the finickiness for an Nvidia setup and multiply it by 10. You better know your way around pip and Linux, as if you don’t get it exactly right, performance will be horrendous, and many setups just won’t work anyway.

WhyJiffie@sh.itjust.works · 3 days ago

You can’t just do “ollama run” and expect good performance, as the local LLM scene is finicky and highly experimental. You have to compile forks and PRs, learn about sampling and chat formatting, perplexity and KL divergence, about quantization and MoEs and benchmarking. Everything is moving too fast, and is too performance sensitive, to make it that easy, unfortunately.

how do you have the time to figure all these out and keep being up to date? do you do this at work?

brucethemoose@lemmy.world · edit-2 3 days ago

As a hobby mostly, but its useful for work. I found LLMs fascinating even before the hype, when everyone was trying to get GPT-J finetunes named after Star Trek characters to run.

Reading my own quote, I was being a bit dramatic. But at the very least it is super important to grasp some basic concepts (like MoE CPU offloading, quantization, and specs of your own hardware), and watch for new releases in LocalLlama or whatever. You kinda do have to follow and test things, yes, as there’s tons of FUD in open weights AI land.

As an example, stepfun 2.5 seems to be a great model for my hardware (single Nvidia GPU + 128GB CPU RAM), and it could have easily flown under the radar without following stuff. I also wouldn’t know to run it with ik_llama.cpp instead of mainline llama.cpp, for a considerable speed/quality boost over (say) LM Studio.

If I were to google all this now, I’d probably still get links for setting up the Deepseek distillations from Tech Bro YouTubers. That series is now dreadfully slow and long obsolete.

melfie@lemy.lol · 4 days ago

Appreciate all the info! I did find this calculator the other day, and it’s pretty clear the RTX 4060 in my server isn’t going to do much though its NVMe may help.

https://apxml.com/tools/vram-calculator

I’m also not sure under 10 tokens per second will be usable, though I’ve never really tried it.

I’d be hesitant to buy something just for AI that doesn’t also have RTX cores because I do a lot of Blender rendering. RDNA 5 is supposed to have more competitive RTX cores along with NPU cores, so I guess my ideal would be a SoC with a ton of RAM. Maybe when RDNA 5 releases, the RAM situation will have have blown over and we will have much better options for AMD SoCs with strong compute capabilities that aren’t just a 1-trick pony for rasterization or AI.

brucethemoose@lemmy.world · edit-2 4 days ago

I did find this calculator the other day

That calculator is total nonsense. Don’t trust anything like that; at best, its obsolete the week after its posted.

I’d be hesitant to buy something just for AI that doesn’t also have RTX cores because I do a lot of Blender rendering. RDNA 5 is supposed to have more competitive RTX cores

Yeah, that’s a huge caveat. AMD Blender might be better than you think though, and you can use your RTX 4060 on a Strix Halo motherboard just fine. The CPU itself is incredible for any kind of workstation workload.

along with NPU cores, so I guess my ideal would be a SoC with a ton of RAM

So far, NPUs have been useless. Don’t buy any of that marketing.

I’m also not sure under 10 tokens per second will be usable, though I’ve never really tried it.

That’s still 5 words/second. That’s not a bad reading speed.

Whether its enough? That depends. GLM 350B without thinking is smarter than most models with thinking, so I end up with better answers faster.

But anyway, I’m get more like 20 tokens a second with models that aren’t squeezed into my rig within an inch of their life. If you buy an HEDT/Server CPU with more RAM channels, it’s even faster.

If you want to look into the bleeding edge, start with https://github.com/ikawrakow/ik_llama.cpp/

And all the models on huggingface with the ik tag: https://huggingface.co/models?other=ik_llama.cpp&sort=modified

You’ll see instructions for running big models on a 4060 + RAM.

If you’re trying to like batch process documents quickly (so no CPU offloading), look at exl3s instead: https://huggingface.co/models?num_parameters=min%3A12B%2Cmax%3A32B&sort=modified&search=exl3

And run them with this: https://github.com/theroyallab/tabbyAPI

melfie@lemy.lol · 4 days ago

Ah, a lot of good info! Thanks, I’ll look into all of that!

Xenny@lemmy.world · 4 days ago

Ai failed and now they are doing this to capture the compute market to then make their profit back through unscrupulous means.

hector@lemmy.today · 4 days ago

As I am told, there is no way these llm’s ever make their investments back. It’s like Tesla at this point. Whomever is paying the actual money to build this stuff is going to get hosed if they can’t offload it onto some other sucker. That ultimate sucker probably being the US taxpayer.

Analog@lemmy.ml · 4 days ago

Can run decent size models with one of these: https://store.minisforum.com/products/minisforum-ms-s1-max-mini-pc

For $1k more you can have the same thing from nvidia in their dgx spark. You can use high speed fabric to connect two of ‘em and run 405b parameter models, or so they claim.

Point being that’s some pretty big models in the 3-4k range, and massive models for less than 10k. The nvidia one supports comfyui so I assume it supports cuda.

It ain’t cheap and AI has soooo many negatives, but… it does have some positives and local LLMs mitigate some of the minuses, so I hope this helps!

melfie@lemy.lol · 4 days ago

Nice, though $3k is still getting pretty pricey. I see mini PCs with a AMD RYZEN AI MAX+ 395 and 96GB of RAM can be had for $2k, or even $1k with less RAM: https://www.gmktec.com/products/amd-ryzen™-ai-max-395-evo-x2-ai-mini-pc?variant=f6803a96-b3c4-40e1-a0d2-2cf2f4e193ff

I’m looking for something that also does path tracing well if I’m going to drop that kind of coin. It sounds like this chip can be on par with a 4070 for rasterization, but it only gets a benchmark score of 495 for Blender rendering compared to 3110 for even a RTX 4060. RDNA 5 with true RTX cores should drastically change the situation of chips like this, though.

brucethemoose@lemmy.world · edit-2 3 days ago

FYI you can buy this this: https://frame.work/products/framework-desktop-mainboard-amd-ryzen-ai-max-300-series?v=FRAFMK0002

And stick a regular Nvidia GPU on it. Or an AMD one.

That’d give you the option to batch renders across the integrated and discrete GPUs, if such a thing fits your workflow. Or to use one GPU while the other is busy. And if a particular model doesn’t play nice with AMD, it’d give you the option to use Nvidia + CPU offloading very effectively.

It’s only PCIe 4.0 X4, but that’s enough for most GPUs.

TBH I’m considering exactly this, hanging my venerable 3090 off the board. As I’m feeling the FOMO crunch of all hardware getting so expensive. And $2K for 16 cores with 128GB of ridiculously fast quad channel RAM is not bad, even JUST as a CPU.

Clam_Cathedral@lemmy.ml · 4 days ago

Honestly just jump in with whatever hardware you have available and a small 1.5b/7b model. You’ll figure out all the difficult uncertainties as you go and try to improve things.

I’m hosting a few lighter models that are somewhat useful and fun without even using a dedicated GPU- just a lot of ram and fast NVMe so the models don’t take forever to spin up.

Of course I’ve got an upgrade path in mind for the hardware and to add a GPU but there are other places I’d rather put the money atm and I do appreciate that it all currently runs on a 250w PSU.