Musk can't convince judge public doesn’t care about where AI training data comes from.
See full article...
See full article...
Oh, it's decidedly this. No question.Hmmm, lets shave with Occam's Razor:
Does Elon want to keep xAI's data sources secret because they are so much better at finding training materials than other AI companies?
Or does Elon want to keep xAI's data sources secret because many of them are copyright infringing or illegal, like the DOGE Social Security dataset that got copied by his henchmen?
because training data isn't a trade secret.
Nah trust me bro, you don't. I'm wearing a hoodie; it's cool. Chill.[raises hand]
Hi, resident of California here. I care about where your training comes from.
Sounds similar to Court Listener by Free Law.If Aaron Swartz was potentially liable (he took his own life due to the savagery of the prosecution) for collecting and making available he content of PACER available, then I'd argue corporations should be held to an equal or higher standard and be transparent for the data sources.
At least Aaron wasn't looking for monetary profit for what he did. I don't see Elon Musk or a number of the other "AI bros" working "for the good of humanity".
They certainly are "unique" in one way or another. I mean, I haven't seen any other AI-model claim to be a Mechahitler....“It is not lost on the Court the important role of datasets in AI training and development, and that, hypothetically, datasets and details about them could be trade secrets,” Bernal wrote. But xAI “has not alleged that it actually uses datasets that are unique, that it has meaningfully larger or smaller datasets than competitors, or that it cleans its datasets in unique ways.”
If I engage with a chatbot to learn about German history I'd like to know it's been trained on something other than Mein Kampf.Public “cannot possibly” care about AI training data
Grok is always compliant, always "happy to help", as it/they claim(s). 10/10 on this one.
Worse, he will have to reveal that 4 chan is the ONLY source.One of the sources is probably 4 chan judging from some of the questionable output it makes.
Unfortunately, the AI companies do want that in their training data. The only way you can train an AI to not make something is to show it what it's not allowed to make. Throw in that some of these companies offer their AI as content moderation services, they absolutely have CSAM neatly categorized and thoroughly documented by some poor souls in a 3rd world country inadequately paid for the horrors they're exposed to.I mean, presumably AI companies don't choose to have CSAM in their training data. And yet here we are.
I would strongly suspect that no one has special and significant data sets that other companies are not aware of. They have way too much money and resources to not have scoured everything for data.Is the secret sauce of AI the data sources or the actual programming? I figure it'd be useful to have your sources that aren't well known to give a model an edge, but ultimately the training should be more important than the dataset all things being equal. I get that this is the wild west starting point and tech companies have no morals if it makes them money, but eventually I would think that there'd be some kind of standard or regulation of what training data can be used.
However, this information is precisely what makes xAI valuable
Well, except xAI. Elon Musk is all in on free speech and fighting political correctness after all.I mean, presumably AI companies don't choose to have CSAM in their training data. And yet here we are.
I'm also waiting for the complete re-write of the Social Security Administration's COBOL code that (checks calendar) was promised in March 2025 to be ready in "just a few months".That reminds me. I'm still waiting on my DOGE rebate check.
Yeah the models aren't that different for the general consumer's use case, maybe some are more fine tuned for specific benchmarking tests or whatever. But they all write with stupid sycophantic crap and emojisSadly, because it's Elon Musk saying it we can be sure it's false.
So "We have no special sauce at all," basically. At this point it's safe to assume that essentially all the model makers in play have access to all the same training data (e.g., everything ever put on the Internet).
In fact it's worse than that. If Musk is alleging that only the proportional contents of his training data set the company apart from its rivals and that this law will not harm its rivals in the same way it would harm his company, that points to a unique vulnerability in their approach.
[raises hand]
Hi, resident of California here. I care about where your training comes from.
If I engage with a chatbot to learn about German history I'd like to know it's been trained on something other than Mein Kampf.
100% to all. And the fact that the Closet Nazi Ketamine Fiend has the gaul to speak for everyone and suggest "nobody cares" tells us everything we need to know about this arrogant, reckless fuck.Good. IMHO, anything that thwarts Musk's insanity is a good thing.
From a practical policy discussion standpoint, I'm of the opinion that all AI developers should be forced to divulge where their training data comes from. Rampant intellectual property theft was what allowed the rapid development of LLMs. IP owners deserve the transparency to determine for themselves whether they want to exercise their rights.
Not morally, not ethically, not pragmatically. He is the definition of a wrong-headed egomaniac.Something tells me one can't go morally wrong betting against Elon Musk
In that case, xAI would be considered to be doing business in those jurisdictions, and thus would be subject to those regulationsI've wondered this about the various EU lawsuits as well. I don't see how CA or the EU can prevent people from using Grok, and having the actual data centers somewhere else is almost certainly cheaper. Can they prevent people from downloading xAI apps or something?
100% to all. And the fact that the Closet Nazi Ketamine Fiend has the gaul to speak for everyone and suggest "nobody cares" tells us everything we need to know about this arrogant, reckless fuck.
Presumably, but you never know. The owner of xAI is in the Epstein files after all.I do suspect that AI companies don't actually know the source of all their training data, they just gobble up all the data they can. I also suspect that they like not knowing.
I mean, presumably AI companies don't choose to have CSAM in their training data. And yet here we are.
Some of us are stupid, a lot of us not. Hypothetical: Some will go hysterical because "Mein Kampf" somehow found its way in the pile of training data. Others understand that that is not the issue and will get upset because none of the works of Hanna Arendt are included.“It strains credulity to essentially suggest that no consumer is capable of making a useful evaluation of Plaintiff’s AI models by reviewing information about the datasets used to train them and that therefore there is no substantial government interest advanced by this disclosure statute,” Bernal wrote.
They made something like $500m in revenue last year with margins so deeply negative they are on par with the externalities of this garbage.Citation needed; I highly doubt that xAI is valuable.
Especially when this is just so easily dismissed by taking 30 seconds to look up any of literally thousands of public online discussions on the topic. The whole argument just makes xAI sound like it's desperately trying to cover for something.100% to all. And the fact that the Closet Nazi Ketamine Fiend has the gaul to speak for everyone and suggest "nobody cares" tells us everything we need to know about this arrogant, reckless fuck.