We absolutely do, but we put ethical and legal limitations on the research we do.I don't see the difference. We research chemical and biological weapons of mass destruction. What they do we absolutely do need to match, not to be able to use it, but in order to be able to defend against it.
No, OpenAI is arguably more capable than the DoD in this matter. The former has a chance of aiding the DoD in defending against automated foreign electoral interference, the latter, by itself, has none.The DoD research chemical and biological weapons. Not Pfizer. OpenAI isn't the DoD or a part of government.
How long…. Depands on memory size and processing capacity. In theory it’s unlimited, vendors cap you though. But the system could go on and on.Yes, but how? And how long can you predict? Right now we're up to 128k tokens with GPT-4 Turbo. That's a book. It doesn't matter if the machine is parroting if it's parroting coherent thought, and if it's parroting coherent thought, how is that different from thinking?
Troll farms are already automated, using technology from US companies, like OpenAI.It reminds me of what Oppenheimer said in the recent biopic. He didn't know whether the allies could be trusted with the bomb, but he knew the Nazis could not be.
What I find compelling is what the Russians did in 2016 with troll farms. If they can automate that, and if we cannot detect it, because of the way our electoral system works, they will choose our leaders.
If they can choose our leaders they can destroy NATO. If they can destroy NATO they have more countries they can start to chop pieces off. If they go far enough, we'll have a new world war. That's what I find compelling. And that's ignoring the "cyber" warfare capabilities of AI.
Edit: And I shouldn't need to mention that if there is another world war, in an age of nuclear weapons, there is a good chance none of us survive. That's it. World over. Everything dead. That's compelling to me.
Your existential crisis isn't an argument.You would have a point if those movies decided elections, but they don't. This is reductio ad absurdum.
If, as the lawsuit alleges, it regurgitates someone else’s copyrighted material, neither you nor it are “making up stuff”. And it wouldn’t matter whether you share it with anyone, because the problem would be that the AI company shared it with you without having the permission to do so.BUT it allows me ot locally and privately make up neat stuff that i dont share for fun
Facts can’t be copyrighted, but articles about them can. The authors of this article couldn’t successfully a copyright claim against someone repeating the facts of the case based on this article. They could, however, do so, if someone copied a sufficiently large amount of text verbatim from their article without getting permission to do so.ONE also forgets that copyright is supposed ot aid htemin keeping ot make art
is news art ? id say no. its factual stuff.
Easy. The commercial organizations we're dealing with such as Google, Facebook, and Microsoft all have the pockets to invest in this, if it is anywhere near as important as you've indicated.The New York Times is. That may not be their intent but if they win, that will be the effect. There will no longer be enough data to train a language model. How would you "teach" a robot to understand modern society if it could not learn from copyrighted sources? You could not. Meanwhile the Russians and Chinese would not give a fuck, and we would suffer the consequences.
Then pay for the right to use it. Respect that not everyone will agree to let you use it.Like I said before. The amount of data necessary to train a language model does not exist without copyrighted data, just like you could not raise a child in modern society to understand modern society without exposure to copyrighted material.
I don't think it's ceding ethical decisions. If a human can learn from books, a machine should be able to as well. A human can memorize books and articles too if they read one enough times. In the case of AI, when that happens, it is legitimately an error. There is no malice.
I am not making that argument because I think the existential one is of more importance, and yes, survival must come before copyright infringement because while the latter might save Mickey Mouse, the former will save lives.
I don't think what I am arguing is an "existential crisis". It's reality. If NYT wins, AI will be effectively crippled if not illegal in the US, leaving us with no defense against our enemies. I don't like it, but I don't see good options here.
how many billions of documents does it take for a child to learn a language? how many hours of audio?The New York Times is. That may not be their intent but if they win, that will be the effect. There will no longer be enough data to train a language model. How would you "teach" a robot to understand modern society if it could not learn from copyrighted sources? You could not. Meanwhile the Russians and Chinese would not give a fuck, and we would suffer the consequences.
It's fair use to create statistics from copyrighted material. What word is most likely next is what's recorded, not the text itself. When you predict word after word and get it right, sometimes it matches the original, but generally when that happens it's undesirable.Yeah, no options, surely a company that got a $10B investment from MS and is supposedly valued at $80B has no options about how to access copyrighted material, no options at all.
If having access to copyrighted material is the differenct between their $80B company existing or not, they can pay for that material. It's not a hard thing to do.
So you're saying a company worth 80 billion has no resources to pay for licenses for the material it is using?That is literally impossible. You are arguing impossible standards which would effectively outlaw AI, and at the worst possible time. Authors are not compensated when children read a library book. Nor should language models that learn from them. They do not copy them. Memorization indicates an error in training where it happens and I am not at all convinced that's what's happened here.
That is literally impossible. You are arguing impossible standards which would effectively outlaw AI, and at the worst possible time. Authors are not compensated when children read a library book. Nor should language models that learn from them. They do not copy them. Memorization indicates an error in training where it happens and I am not at all convinced that's what's happened here.
As is often the case with copyright, it's not actually clear what the law is when your use doesn't involve mechanistically copying works verbatim. And even then, as noted by the article, it's still possible to copy millions of works verbatim and still fall under fair use."Do whatever you want, ethically and legally, in a monomaniacal pursuit of intelligent machines" isn't a philosophy I find particularly compelling.
How about these companies, which are massive commercial ventures and not the saviors of humanity, follow the law?
Machines aren't people. They're not children. I flat out reject any and all analogies that assume what's fine for one is fine for the other. And so does the law so far. A human can hold copyright, a machine cannot. Because they are not equivalent.I don't think it's ceding ethical decisions. If a human can learn from books, a machine should be able to as well.
There is no malice because machines are incapable of motives and emotions. Corporations, being made of people, are. I think it's abundantly clear that OpenAI is not an ethical organization, and you'd be as naive to trust them as to announce that Google is not going to do any evil, therefor should hold everyone's data.A human can memorize books and articles too if they read one enough times. In the case of AI, when that happens, it is legitimately an error. There is no malice.
You've made no convincing argument that by allowing OpenAI AI to scrape the internet as they see fit that lives will be saved. "But Russia!" isn't actually an argument. It's FUD along the lines of saying "the only way to stop a bad guy with a gun is a good guy with a gun" when people are discussing sensible gun regulations.I am not making that argument because I think the existential one is of more importance, and yes, survival must come before copyright infringement because while the latter might save Mickey Mouse, the former will save lives.
I'm sorry, but what?I don't think what I am arguing is an "existential crisis". It's reality. If NYT wins, AI will be effectively crippled if not illegal in the US, leaving us with no defense against our enemies.
"Do whatever you want, ethically and legally, in a monomaniacal pursuit of intelligent machines" isn't a philosophy I find particularly compelling.
How about these companies, which are massive commercial ventures and not the saviors of humanity, follow the law?
Sounds like we should have a court case!As is often the case with copyright, it's not actually clear what the law is when your use doesn't involve mechanistically copying works verbatim. And even then, as noted by the article, it's still possible to copy millions of works verbatim and still fall under fair use.
To do what you're asking they would need to license everything in the training which is a large part of the internet. Not everything is properly attributed so you'd need to factor in plagiarism as well. You'd need to accurately attribute everybody's words, contact everybody, negotiate with everybody, and only then begin training. The logistics alone of making all those requests makes it impossible.So you're saying a company worth 80 billion has no resources to pay for licenses for the material it is using?
It’s weird that you think it’s bad to kill a technology that causes harm to people. Like you think technology is more important than actual human happinessTo do what you're asking they would need to license everything in the training which is a large part of the internet. Not everything is properly attributed so you'd need to factor in plagiarism as well. You'd need to accurately attribute everybody's words, contact everybody, negotiate with everybody, and only then begin training. The logistics alone of making all those requests makes it impossible.
And let's assume we figure out a way that we need less data. Maybe only what a child needs to learn. Children learn from copyrighted books. You would have to raise your hypothetical robot away from copyrighted sources. In modern society this isn't possible. Kids read books, watch movies, listen to radio. They don't memorize unless you make them read things many time and when you do so you have wasted that child's valuable time. I don't see how it is any different here. You would kill this technology because you're upset about the economic effects it might have. I get that, and there will be consequences. OpenAI admits it. But also we do need to do this before others do because there are worse things than us losing our jobs, believe it or not.
"Laws can be unjust, therefore we should not advocate that companies should follow them" isn't an argument.legality != morality
morality != legality
While it's easy to throw ire at megacorps, the exact same argument is used by asshats when saying "why didn't they just follow the law?" in regards to young black men being sent to prison for 20 years for having four joints on them during an unlawful detention.
If "follow the law" is your answer to literally any question, then you're no better than the "Rule of Law" Republicans. I suspect you have much higher "real" moral ground, but the argument is a slippery slope, at best, and disingenuous at worst.
If we, as a society (Western society) actually adhered to the law on the whole, and those who created and enforced laws did so fairly and without bias, you might have a valid point. As it stands, in Western societies and doubly so in the United States, that's poor rhetoric.
It's fair use to create statistics from copyrighted material. What word is most likely next is what's recorded, not the text itself. When you predict word after word and get it right, sometimes it matches the original, but generally when that happens it's undesirable.
Besides which, the amount of data necessary to train a language model, licensed to do so, would be impossible for any single entity to purchase. It's also likely not possible to filter the volume of text necessary to remove all copyrighted works.
The training works by feeding significant chunks of the internet into a model until it can predict the next word accurately. It's not possible to filter significant portions of the internet for copyrighted works completely. The technology doesn't exist. If that's the standard you have, then you outlaw AI (but only for the US, not for our enemies). This would be like chopping off your nose to spite your face.
your comment and the comment you referred to pretty much define the limits of the dispute. it will be interesting to see how all this shakes out.Define publicly available?
Are Disney+ films publicly available because I can stream them?
Because they're not paying the Redditors, duh. Imagine hunting down every author who has every put words on the internet. That's not possible. Everybody, even you and me, would have cause to sue. It would outlaw AI.Apparently, all of Reddit data was just 60 Million a year.
Because they're not paying the Redditors, duh. Imagine hunting down every author who has every put words on the internet. That's not possible. Everybody, even you and me, would have cause to sue. It would outlaw AI.
I see it's potential as well as the harm it can cause. The harm we see right now exists because our society is not ready for the consequences, not that the technology itself is evil somehow. I these predictive models as tools. They can be used as weapons or shields, and we don't want to be in a position where the other guy has a capable weapon and we have no shield, and there can't be any human happiness if there are no humans left.It’s weird that you think it’s bad to kill a technology that causes harm to people. Like you think technology is more important than actual human happiness
There isn't enough to allow-list. Maybe Meta can do it with everything that's posted on Facebook but if they did such a model would only be able to generate Facebook posts.No, the Ars tos probably gives them rights over what we post here. And Reddit TOS gives them rights about what we post there.
And if you don't know who the copyright holder of www.fenris_uy.com is, you just don't use that site in your training.