"Copyright today covers virtually every sort of human expression" and cannot be avoided.
See full article...
See full article...
Them waxing on about "disrupting" is just libertarian code. Their real philosphy is make money now consequences be damned.This runs completely counter to the "move fast and break things" philosophy they all believe in.
The whole point is to "disrupt" social norms and/or the regulatory structure to profit before regulations can catch up with them. LLMs may have moved too slowly though - creator backlash has been quick and LLMs are still a novel toy rather than a necessary tool for life. It would be trivially easy to tell these companies "So what? Pay them." and move on. They haven't actually disrupted enough to make it painful for us to do that yet (mostly because LLMs look impressive but actually have fewer use cases than people think when you start trying to use them).
google isn't creating new content based on what it crawls. Though it does do many illegal things like steal content from pages and show it in search results - meaning people don't click on the link. Or the entire news thing. If this means google gets punished and fined billions - well its about time.My fear is that the way ChatGPT crawls websites and pulls data from them will be seen as no technically different from, say, Google indexing, which has always been opt-out and indexed basically everything publicly available on a website. Ethically I agree it's a completely different matter given how ChatGPT is actually using the material, but on a legal level I'm worried it'll be hard to stop them.
Honestly, you just described openAI. So did the testimony. It is literally organized crime.Organised Crime : Our entire business practice cannot operate within the confines of the law, so you should let us continue with it.
Why bother? Google and facebook have demonstrated stealing every news article while not paying for them works.NYT owner has a market cap of around 7 billion USD. MSFT should just buy them with some of the loose change in couch cushions in Redmond. Heck, MSFT could buy out every newspaper and book publisher and not break a sweat.
Do you also think nothing can be done about gun violence? Guns exist, the genie is out of the bottle. Too bad there is "nothing" that can be done to prevent gun violence.The genie is already out of the bottle, it’s too late to stop it, in five years these arguments will all look like Metallica complaining about Napster
Can I use your car to drive to work then give it back when I'm done? Same idea - I'm taking something of value from you.Using it for training is fair use, sure, but when it regurgitates that training data, that's infringement.
NYT doesn't want to get paid. Their terms are anything trained on their data should be deleted. Its not for sale.ChatGPT training corpus size order of 10^12 words. OpenAI annual revenue order of 10^9 dollars. That's a tenth of a cent per word for your million words ~$1k. This, of course, neglects any kind of operating costs, which I've seen estimated at 10^7-10^8 dollars/year just to support the infra for chatgpt.
So, even if they do end up paying you, it's not going to end up being some huge windfall. In fact, all the big players (NYT, etc) are gonna take all the biggest slices of cake, and you'll get to have the crumbs. Possibly it will be enough to buy you and a date a nice dinner. Enjoy.
Are you getting paid to post this bullshit?IP law does not grant you any protections as a human consumer of content. It grants a monopoly to copyright holders to create incentives for creating content. One could definitely argue that blocking AI training is contrary to the goal of incentivizing content creation.
You are literally saying that since its hard/impossible without breaking the law, then its ok. The ends justify the means eh?Licensing training data for input is a non-starter. If that is required it is the end of LLMs since there will be no accessible, broad training data. Just consider training off from the Internet -- 133 million active domains. That's 133 million licenses you would need to acquire. That's just never going to happen.
Sure you could declare some statutory royalty for training but places like the NYT will fight that forever. And statutory royalties worked out so, so well in the music industry where the labels run off with 90% of it. For sure the same thing would happen with training royalties and a few firms will run off with the lion's share of the money collected.
Cause one of them is a human. The other is a business. Why do people pretend they are the same.I initially had the same gut response as many others here, but, the same logic applies to humans.
We grow up exposed to all sorts of copyrighted content and our culture and taste changes over the course of our lives as a result of this content.
The content any creative human generates is undoubtedly influenced by this exposure to copyrighted content and we acknowledge this is our (extremely generous IMHO) copyright laws.
Why is the content generated by a computer exposed to the same material (albeit likely a vastly larger subset) any different?
So if I steal textbooks and learn from them to pass the class thats ok too? If I take the source code for some software you sell and tweak it eough to be legally distinct, thats ok too? If I read your diary and publish a book about a character's life identical to yours (But with an "e" on the end of the name) thats ok with you?I consider using it for training to be fair use. Reproducing large portions verbatim should be licensed or eliminated from the models.
Just cause its hard doesn't mean the laws can be ignored. That just mean the business model is a failure. Hence the comments about organized crime.It is not domain holders that you need licenses from. It is all content creators. Anyone that produces content (like this text), has copyright on that content. There is just no way to even find someone to request a license from for most content. Say you scrape Facebook content. That's like 2 billion anonymous users. How do you license that? How how would you even know if the posters actually own the content? It is not doable, I'd say, and the problem is much, much larger than finding 133 million domain holders.
I prefer the terms of the NY Yimes lawsuit. Delete all AI trained on sets that stole thier data. Open AI deserves NOTHING. They have no bargaining position. Why should penalties be limited based upon their current income?I never said that they could. However, obviously, they can't give creators more money than they have. So, creators can either accept the pittance, or take their ball and go home. My point is that if you're a creator, and you're hoping to actually make reasonable money on such a deal, well, that pool of money doesn't look nearly as big when you consider how many swimmers there are.
Once again, AI is not human and these are in no way alike, which you know.You know, anyone can read a book and write an analysis of it. They can even publish the analysis or sell access to it in its own right, without paying the author of the book they analyzed. That’s fair use, right?
And later, if someone uses that analysis to produce a new product or service, the author of the book that was analyzed doesn’t get paid, right?
Training a model is like analyzing many books in aggregate. The resulting analysis does not contain the text of the original books. It’s definitely fair use to produce a private analysis of copyrighted works.
Now imagine you are OpenAI. You have this big analysis you’ve done, and you can take user prompts and use them along with it to generate text. You aren’t selling the analysis (model) you built. You aren’t selling anything your users produce with the model. You sell access to the tool.
If the tool generates infringing content, it does so in response to user input. In fact, the tool’s output is partially derived from the user’s input. OpenAI doesn’t put infringing content out there in public.
The examples of being able to craft prompts in order to extract a facsimile of training data? That’s being engineered out more and more each day.
So it seems to me that under fair use, these companies are permitted to train on whatever they like, and people whose content is used in training have no way of getting paid.
And liability for generated infringing content will likely rest with users of these tools.
Its been exhaustively proven that they are not using publically available material. Its been proven they've stolen millions of NON-PUBICALLY available stories and images. And thats only the tip of the iceberg. Thats just the very small 0.0001% of cases which had enough money/skill to prove it.That is not how OpenAI works. Each stream from Spotify is at least one copy (probably several, if you count all the caches and stuff that you don't see).
You can't attribute the output from Chat GPT to any single source.
Open AI is using publicly available material to train a model in order to generate completely NEW content.
What NYT is arguing, is that using the publicly available content for AI training, say instead of indexing for search, or learning from or commenting on, should be illegal.
We're in new territory here and I do not see this as open and shut, like. most people here seem to do.
AI steals the works and learns from them. Then millions of copies of the AI are sold. Since the ai stole that and then is sold, thats millions of times the works are being sold without permission.What exactly is the problem that you see magnified here? It is not the copying as there are no millions of copies distributed. My feeling is that you see a problem with AI more than the copying. That people can be replaced by machines.
I too see issues with this and I do think new legal and social frameworks might be required to handle this, but I also see great potential for all of humanity. I only stand to profit in the sense that I do believe that the AI tools can be used for the greater good. You seem to have a much darker view of this.
How am I? Theft is pretty cut and dry. If I charge money to show a disney picture, disney lawyers won't give me a pass or quibble. They will take my house as payment and throw me in jail. Why is this different because its a tech company doing the stealing?You are grossly over simplifying the "stealing" part. We are talking about material that is made available to the public for free and that are being indexed by search engines, for example. What else are you allowed/not allowed to do with that content, that is the question here and the legality of AI training is what is disputed.