AI industry horrified to face largest copyright class action ever certified

Nilt

Ars Legatus Legionis
21,810
Subscriptor++
It is a bit more nuanced than that.

If the material being copied is strictly for in-class use and is pure research, then it is almost certainly fair use.

But if the material being copied is for public use by the class (e.g., a play or song), then it is not fair use.

And if the material being copied is from an existing text book, then it is not fair use.
Sure but that isn't what was said. They said when teachers must have a license to use anything when they are "training human students". That is just plain wrong.
 
Upvote
4 (9 / -5)

JohnDeL

Ars Tribunus Angusticlavius
8,596
Subscriptor
Sure but that isn't what was said. They said when teachers must have a license to use anything when they are "training human students". That is just plain wrong.
As was your response.

Teachers and schools have been sued for copyright violations; the code you cited is not a get-out-of-trouble free card.
 
Upvote
18 (20 / -2)

Nilt

Ars Legatus Legionis
21,810
Subscriptor++
Weird how the two or three shills for the AI industry that regularly post comments about how the latest LLM just released today is already saving them so much time, and will definitely be the breakthrough that will prove all the doubters wrong, never post on stories about the copyright aspect. Either (a) they don't have a good counter argument or (b) they get AIs to write all their comments for them, and those AIs have been hardcoded not to respond to questions about copyright lawsuits.
Another distinct possibility is they're paid shills who are prohibited from discussing the case publicly because they're agents of the company in reality, even if not openly so. I don't necessarily think so but it'd also fit the facts so far.
 
Upvote
20 (21 / -1)
Post content hidden for low score. Show…

Charles Hunter

Smack-Fu Master, in training
69
What's interesting about the appeal argument is it boils down to "we couldn't possibly identify the owners of the training materials and nobody else can either (including lots of creators who have no idea we used their work) therefore it was OK for us to use it". It's circular reasoning.

Assuming the court concludes that AI training is covered by copyright law and that each unauthorised use constitutes a breach, then the task for the court is quite simple. How many distinct works was your AI trained on? That's N. How many licences did you obtain? That's M. Your fine is $150K*(N-M) which you will pay into a court-administered trust fund, from which rights holders will be paid as and when they come forward over time AND you will destroy your existing AI and recreate it using only those M works, plus such other materials for which you obtain licences.

I also haven't seen any discussion of "terms of service" breaches where robots.txt directives are simply ignored by AI crawlers. There has got to be scope in that for a separate class action.
 
Upvote
29 (30 / -1)

PghMike4

Smack-Fu Master, in training
92
This is never going to happen. There is a trillion dollars invested in this stuff and Trump and Congress are going to find a way to make it legal and allow AI to fuck over every content creator, writer, artist, and so on. We are solidly in the command and control market economy now and nobody is going to allow 10,000 points to get wiped off the Dow. The billionaires are going to get their money.

The basic economic theory from the right is pretty close to wipe out all labor, go to a full asset economy, make money off of crypto, meme stocks, and various scams, turn Goldman Sachs into a rack of computers. We can always have prisoners pick our crops until we invent robots to do it - prison slavery is still legal in the US after all.
I don't think AI is going to work well enough to do anything to the labor market, but it does provide a way to steal lots of intellectual property, and unfortunately I think SCOTUS is eventually going to back this massive theft.
 
Upvote
19 (19 / 0)

JohnDeL

Ars Tribunus Angusticlavius
8,596
Subscriptor
Hell, Clippy would be better.
Are you sure about that?

1754704758619.png
 
Upvote
14 (14 / 0)

Missing Minute

Wise, Aged Ars Veteran
1,386
Edit: Ok, the quote is blocked which is fine. Regardless, saying this is theft whether by "AI" companies or individuals is such bullshit. Copyright infringement has been explicitly stated by SCOTUS and multiple other federal courts to not be equivalent to theft.
Right, because courts are the exclusive arbiter of the linguistic meaning of theft. You can absolutely call someone a thief for stealing your idea to wear a blue dress to prom.
 
Upvote
-8 (4 / -12)

Missing Minute

Wise, Aged Ars Veteran
1,386
So many are comparing napster and individuals that downloaded. Wrong comparison.
Napster SOLD/gave away the music. Most individuals that were sued, was because they were offering up videos/music to others. Im not certain, but I believe that not a single user was sued that downloaded CR items without DRM, but did not provide it to others. Lots of issues about fair use with this last one, but again, I do not believe that ppl were sued for that.
AI downloaded it, but does not provide it to others. It is only used by the AI.
I believe that this is fair use.

If that is not the case, then China, Russia, and others will start jumping for joy.
Where did you get the idea "that not a single user was sued that downloaded CR items without DRM"?
 
Upvote
9 (12 / -3)

pixelm11

Smack-Fu Master, in training
1
You know, they pay for engineers, they pay hundreds of billions for compute and energy, but they can't pay anything for the work of authors, musicians, artists, journalists. Put a little energy and capital into tech for licensing and problem solved. ASCAP does a pretty good job of compensating composers despite the volume of music.
 
Upvote
39 (39 / 0)

TC26

Wise, Aged Ars Veteran
163
Or, like early autonomous driving results, maybe this is just as good as it's ever going to be. It'll get stripped down and simplified and used for things like managing telephone "help" labyrinths and replace the robovoiced hard-wired mazes used now.

In fact, AI models are degenerating, and getting worse, and there does not currently exist a solution to that problem.

https://www.ibm.com/think/topics/model-collapse

https://www.ibm.com/think/topics/catastrophic-forgetting

Among other resources.

There is is not really any logical path to AI improvement going forward. All the extant human-created data has already been stolen and trained on. We're already generating AI slop faster than humans can generate useful data, and those humans will not have any motivation to continue doing so anyway, due to the aforementioned theft of their work product. So from here on, AI models will be trained on their own slop, with predictably terrible results.
 
Upvote
33 (36 / -3)

TC26

Wise, Aged Ars Veteran
163
It is morally wrong to steal the creative work of millions of people to feed your industrial-creation machine in order to replace those people. The valuations of these companies are clearly based on the belief they will replace millions of workers and take a % of their salaries. Stealing their work without pay in order to replace them fucking sucks.

Hah, "morals"! Good one! What are you, 200 years old?
 
Upvote
-5 (2 / -7)

TC26

Wise, Aged Ars Veteran
163
All these shareholders in AI companies need to ask themselves, why can't the AI generate its own content by now? A 'thinking machine' that has to very expensively webcrawl and summarize the world's content repeatedly and still can't actually think for itself?

What kind of 'generative AI' can't generate its own content? Generative AI is a smoothie blender, not a farm. You have to feed it as much as it feeds you.

It's a tech demo and a mechanical turk, not a thinking machine. The economics don't even work.

All current AI implementations are just enormous averaging machines. They are trained on a set and when queried, they return the average answer from their training. They are useless for creating anything new, they only "create" an average of whatever already exists, that was the work of humans. Unless their input is well curated (by humans), their output will just be more mediocrity, and nothing novel nor useful.
 
Upvote
19 (21 / -2)

TC26

Wise, Aged Ars Veteran
163
I would argue that making jokes about morals not existing is bad for society.
And I would reply that the disintegration of the concept of morality is what actually harms society, and observing this disintegration -- with humor or without -- is necessary if that decline is ever to be reversed.
 
Upvote
5 (8 / -3)
Post content hidden for low score. Show…
Post content hidden for low score. Show…
Where did you get the idea "that not a single user was sued that downloaded CR items without DRM"?
Did you choose to not read or include the IMPORTANT part in this?

I believe that not a single user was sued that downloaded CR items without DRM, but did not provide it to others
 
Upvote
-17 (0 / -17)

Derecho Imminent

Ars Legatus Legionis
16,259
Subscriptor
Upvote
11 (11 / 0)

SubWoofer2

Ars Tribunus Militum
2,550
This story is incredibly one-sided.

Did you even reach out to the plaintiffs at all?

Edit: no, seriously. The story cites Anthropic, then it cites a bunch of industry groups that back Anthropic. It doesn't cite the plaintiffs.

It's quite easy to find copyright holders who could be plaintiffs. Throw a stone, if it hits a writer, ask them. An example is here in Melbourne, Australia, where turns out over 90% of the authors speaking at or members of a SF book discussion group have had their works assimilated by the meta AI borg. . List below. Some have very low sales, mere hundreds of copies. But still they were borged. Australian Society of Authors has asked all affected writers to contact them.



Nova Mob members, friends, and guests borged into Meta’s AI​


Roll a dice to choose the next word to build a sentence. Keep doing that 50 times to build a paragraph or page. What are the chances that you will accurately reproduce a section of a Harry Potter novel? About 98%, if you are one particular AI model.

But before naming that Artificial Intelligence model, and which novels are uncannily reproduced with no money going back to the writer, how do books get into the AI training set in the first place? If you are Meta, you use a database of pirated books and hoover it all up in its entirety, according to The Atlantic. Just like the Borg on Star Trek.

Turns out almost all the Nova Mob’s published members, friends, and our guests, are part of the borged data set that Meta ate for its training set.

Did LibGen have permission to reproduce the books of these writers?
Did Meta have permission to borg them up into its maw, to train its AI with?
Search for yourself:

Search LibGen, the Pirated-Books Database That Meta Used to Train AI

https://www.theatlantic.com/technology/archive/2025/03/search-libgen-data-set/682094/

“Millions of books and scientific papers are captured in the LibGen collection’s current iteration.” Including novels, stories, and non-fiction by all these people, I’ve checked:

Eugen Bacon, Max Barry, John Birmingham, Jenny Blackford, Russell Blackford, Sue Bursztynski,
James Cambias, Trudi Canavan, Paul Collins
Jack Dann
, Chris Flynn
Rob Gerrand
, Kerry Greenwood
Lee Harding
, Richard Harland, Robert Hood
Van Ikin
, George Ivanoff
Paul Kincaid
Vanessa Len
, Ken Liu
Sophie Masson
, Bren MacDibble, Iain McIntyre, Sean McMullen, Andrew MacRae, Farah Mendlesohn, Meg Mundell
Shelley Parker-Chan
, Hoa Pham, Gillian Polack
Jane Routley
, Lucy Sussex
Shaun Tan
, Keith Taylor
Kaaron Warren
, Janeen Webb

 
Upvote
24 (24 / 0)

Tratios

Smack-Fu Master, in training
1
Perhaps if asked to use material instead of stealing it which is what they have already done. I remember working in a college library and copying more then 10 or 15 pages was a violation and that all the professors "reserved class materials" where limited unless they had special premission. I understand what AI wants but if that is the case then you need to pay to use it just like anyone else. They are basically arguing that they broke the law but we needed to break the law because otherwise we could not financially develop our product but now that we broke the law if you hold us responsible we cannot financially recover. That is a crazy, wild and highly non legal argument.
 
Upvote
20 (20 / 0)

Gunman

Ars Scholae Palatinae
1,355
It's basic copyright law.

If you use someone's copyrighted works without permission for financial gain (which clearly they are) or in a way that diminishes the value of the original work (which they almost certainly are), or if you create new works that are derivative of the original work (which they are doing almost by definition), you have violated copyright law.

Fair use doesn't apply here because of the size and scope of the use.

Anthropic is screwed and so they should be.
Is it really financial gain if they've been bleeding money since day one with no end in sight? /s
 
Upvote
1 (5 / -4)

TVPaulD

Ars Tribunus Militum
2,005
Personally, if AI was a good thing, I’d be happy to cut it the same slack we do children. That is, allow it an educational exemption. It’s not like children don’t copy. What is the saying? Good artists borrow, great artists steal?

But I would have to be convinced AI served the public good, and it is pretty hard to believe that if it is owned by billionaires.

Ultimately, the government may have to nationalize AI labor like it does the broadcast spectrum. It is hard to imagine how it will support UBI otherwise.
I wouldn’t. Children are people. AI is not a person. AI is a part of machines, machines which are built and run by corporations and adults who are culpable for their actions.
Quite a bit different here the end product is a neural net.
It’s not different at all. Both things involve computer data encoding source information. They just happen to involve encoding it in different ways.
It may have some weights tailored that remember a section of a book but by the same logic so would a person's brain.
No, that does not follow in any way, because humans are not machines and exist in nature. Neural nets are artificial, digital constructs that exist entirely in machines as a way to simulate a facsimile of how a brain works.
I think it would hold water to make sure they legitimate bought a copy/license to read each book but probably a bit of a stretch to say the neural net itself is infringing.
It’s not even remotely a stretch. The model is built from the data fed into it. It creates a mass statistical model of all the data fed into it to distill that information down into a smaller form which can then later be decoded by prompting. Once created, the model itself is in effect simply a lossily compressed copy of the training data. Applying compression - lossy or otherwise - does not wash away the copyright. There’s plenty of more conventional lossy compression that uses statistical methods to encode the data. This is not a novel or controversial area, the difference is really just the scale and breadth.
There may be entire books that don't adjust the neural net at all and now they are layering sythetic data on top of it might even undo the original adjustment.
Even if we accept that premise, which is more debatable than your framing would suggest, then it is still incumbent upon them to prove it. If they can’t, then the fact they fed the data into the model at all is all anyone has to go on.
 
Upvote
25 (26 / -1)

Theemis

Wise, Aged Ars Veteran
128
Subscriptor
AI industry should pay a reasonable license fee for the material used witch is still protected by copyright.

Like human beings should buy the books they read and the music that listen. AI needs to pay for the content they use and profit.

Probably a % of their sales should go for this. And it is actually even good for the industry in the long term, as there is a need for humans to continue to produce quality, original content.
 
Upvote
10 (11 / -1)

DarthSlack

Ars Legatus Legionis
23,061
Subscriptor++
I don't think AI is going to work well enough to do anything to the labor market, but it does provide a way to steal lots of intellectual property, and unfortunately I think SCOTUS is eventually going to back this massive theft.

AI is very much going to disrupt the labor market, just not the way vendors and boosters think. How do we know? We've been here before with offshoring.

Much like hiring Indian companies to offshore US office work, CEOs across the country are looking at AI as a way to slash their payroll and cement their next bonus check. Also like hiring Indian companies to replace US workers, it's not going to go well. In some cases it will be a spectacular failure. But the CEOs driving this won't care because a) They've already nailed down their gargantuan bonuses and b) The failure will be the problem for the next CEO. Even if they stick around long enough for it to be their problem, they have a golden parachute to make sure that they don't actually suffer from their fuck-up.

How many CEOs did you see actually pay a price for screwing up offshoring?
 
Upvote
36 (36 / 0)