TL;DR: The big tech AI company LLMs have gobbled up all of our data, but the damage they have done to open source and free culture communities are particularly insidious. By taking advantage of those who share freely, they destroy the bargain that made free software spread like wildfire.


If you didn’t want your code to be used by others then don’t make it open source.
Do you understand how free software works? Did you read the post? I’d love to clarify, but I’m not going to rewrite the article.
Also - this conclusion is ridiculous:
That is absolutely not true. It doesn’t remove the copyright from the original work and no court has ruled as such.
If I wrote a “random code generator” that just happened to create the source code for Microsoft Windows in entirety it wouldn’t strip Microsoft of its copyright.
Sorry, I just got around to this message. That is the idea of the provenance – clearly, the canonical work is copyright. It is the version that has been stripped of its provenance via the LLM that no longer retains its copyright (because as I pointed out, LLM outputs cannot be copyright).
That doesn’t make it “no longer copy-written” though. The original copyright holder retains their copyright on it. I can’t see any court ruling otherwise.
The output of the LLM can be incorporated into copyrighted material and is copyright free. I never claimed that the copyright on the original work was lost.
Yes. And this is kinda hand-wavy bullshit.
That’s not how it works. Your code is not “incorporated” into the model in any recognizable form. It trains a model of vectors. There isn’t a file with your
for loopin there though.I can read your code, learn from it, and create my own code with the knowledge gained from your code without violating an OSS license. So can an LLM.
Why is Clean-room design a thing then?