Musk fails to block California data disclosure law he fears will ruin xAI

Fatesrider · Mar 6, 2026

DrewW said:
Hmmm, lets shave with Occam's Razor:

Does Elon want to keep xAI's data sources secret because they are so much better at finding training materials than other AI companies?

Or does Elon want to keep xAI's data sources secret because many of them are copyright infringing or illegal, like the DOGE Social Security dataset that got copied by his henchmen?

Oh, it's decidedly this. No question.

People in AI who do things legally don't CARE about this stuff:

xAI had tried to argue that California’s Assembly Bill 2013 (AB 2013) forced AI firms to disclose carefully guarded trade secrets.

because training data isn't a trade secret.

josh1984 · Mar 6, 2026

Sources? Isn't it everything? Like every drop of 1's and 0's on the planet. Its just how they process that data that makes its special?

darkowl · Mar 6, 2026

I’ll save them time - it’s exclusively trained on copies of Mein Kampf.

OSB · Mar 6, 2026

Aurich said:
[raises hand]

Hi, resident of California here. I care about where your training comes from.

Nah trust me bro, you don't. I'm wearing a hoodie; it's cool. Chill.

Delerious · Mar 6, 2026

ajmas said:
If Aaron Swartz was potentially liable (he took his own life due to the savagery of the prosecution) for collecting and making available he content of PACER available, then I'd argue corporations should be held to an equal or higher standard and be transparent for the data sources.

At least Aaron wasn't looking for monetary profit for what he did. I don't see Elon Musk or a number of the other "AI bros" working "for the good of humanity".

Sounds similar to Court Listener by Free Law.

Rachelhikes · Mar 6, 2026

xAI would be ruined? Oh, no. Anway…

WereCatf · Mar 6, 2026

“It is not lost on the Court the important role of datasets in AI training and development, and that, hypothetically, datasets and details about them could be trade secrets,” Bernal wrote. But xAI “has not alleged that it actually uses datasets that are unique, that it has meaningfully larger or smaller datasets than competitors, or that it cleans its datasets in unique ways.”

They certainly are "unique" in one way or another. I mean, I haven't seen any other AI-model claim to be a Mechahitler....

J-Be · Mar 6, 2026

Public “cannot possibly” care about AI training data

If I engage with a chatbot to learn about German history I'd like to know it's been trained on something other than Mein Kampf.

Korios · Mar 6, 2026

Frodo Douchebaggins said:
View attachment 129838

Grok is always compliant, always "happy to help", as it/they claim(s). 10/10 on this one.

Don Reba · Mar 6, 2026

Dakar said:
One of the sources is probably 4 chan judging from some of the questionable output it makes.

Worse, he will have to reveal that 4 chan is the ONLY source.

bushrat011899 · Mar 6, 2026

ChrisSD said:
I mean, presumably AI companies don't choose to have CSAM in their training data. And yet here we are.

Unfortunately, the AI companies do want that in their training data. The only way you can train an AI to not make something is to show it what it's not allowed to make. Throw in that some of these companies offer their AI as content moderation services, they absolutely have CSAM neatly categorized and thoroughly documented by some poor souls in a 3rd world country inadequately paid for the horrors they're exposed to.

zarkonite · Mar 6, 2026

More likely he wants to hide the fact there is nothing special about any of his datasets or methodologies.

ranthog · Mar 6, 2026

Granadico said:
Is the secret sauce of AI the data sources or the actual programming? I figure it'd be useful to have your sources that aren't well known to give a model an edge, but ultimately the training should be more important than the dataset all things being equal. I get that this is the wild west starting point and tech companies have no morals if it makes them money, but eventually I would think that there'd be some kind of standard or regulation of what training data can be used.

I would strongly suspect that no one has special and significant data sets that other companies are not aware of. They have way too much money and resources to not have scoured everything for data.

TylerH · Mar 6, 2026

However, this information is precisely what makes xAI valuable

Citation needed; I highly doubt that xAI is valuable.

TylerH · Mar 6, 2026

ChrisSD said:
I mean, presumably AI companies don't choose to have CSAM in their training data. And yet here we are.

Well, except xAI. Elon Musk is all in on free speech and fighting political correctness after all.

eightycc · Mar 6, 2026

msawzall said:
That reminds me. I'm still waiting on my DOGE rebate check.

I'm also waiting for the complete re-write of the Social Security Administration's COBOL code that (checks calendar) was promised in March 2025 to be ready in "just a few months".

momoisdabest · Mar 6, 2026

Wheels Of Confusion said:
Sadly, because it's Elon Musk saying it we can be sure it's false.

So "We have no special sauce at all," basically. At this point it's safe to assume that essentially all the model makers in play have access to all the same training data (e.g., everything ever put on the Internet).
In fact it's worse than that. If Musk is alleging that only the proportional contents of his training data set the company apart from its rivals and that this law will not harm its rivals in the same way it would harm his company, that points to a unique vulnerability in their approach.

Yeah the models aren't that different for the general consumer's use case, maybe some are more fine tuned for specific benchmarking tests or whatever. But they all write with stupid sycophantic crap and emojis

codeycodey · Mar 6, 2026

Aurich said:
[raises hand]

Hi, resident of California here. I care about where your training comes from.

What's bad for Musk is a good barometer pointing towards "good for humanity".

SubWoofer2 · Mar 6, 2026

J-Be said:
If I engage with a chatbot to learn about German history I'd like to know it's been trained on something other than Mein Kampf.

Likewise if I was seeking advice on how to engage with any human being, e.g. a memo to my boss, going shopping with my partner, commenting on a web posting. Knowing that the data is sourced from the chans would be ... quite relevant.

MilanKraft · Mar 6, 2026

OldPhartReef said:
Good. IMHO, anything that thwarts Musk's insanity is a good thing.

From a practical policy discussion standpoint, I'm of the opinion that all AI developers should be forced to divulge where their training data comes from. Rampant intellectual property theft was what allowed the rapid development of LLMs. IP owners deserve the transparency to determine for themselves whether they want to exercise their rights.

100% to all. And the fact that the Closet Nazi Ketamine Fiend has the gaul to speak for everyone and suggest "nobody cares" tells us everything we need to know about this arrogant, reckless fuck.

MilanKraft · Mar 6, 2026

LesMilpool____ said:
Something tells me one can't go morally wrong betting against Elon Musk

Not morally, not ethically, not pragmatically. He is the definition of a wrong-headed egomaniac.

s73v3r · Mar 6, 2026

jlredford said:
I've wondered this about the various EU lawsuits as well. I don't see how CA or the EU can prevent people from using Grok, and having the actual data centers somewhere else is almost certainly cheaper. Can they prevent people from downloading xAI apps or something?

In that case, xAI would be considered to be doing business in those jurisdictions, and thus would be subject to those regulations

adingo · Mar 6, 2026

Musk is arguing that these “trade secrets” give xAI a competitive edge, which it would lose if forced to divulge the info.

What competitive edge? My understanding is that Grok sucks it big time. Musk should be in favor of this law, and try to force OpenAI and others to comply as well, so he can steal their “secrets.”

Random_stranger · Mar 6, 2026

MilanKraft said:
100% to all. And the fact that the Closet Nazi Ketamine Fiend has the gaul to speak for everyone and suggest "nobody cares" tells us everything we need to know about this arrogant, reckless fuck.

He hired a PR group from France? (I think you meant gall, as in gallbladder)

brentbent · Mar 6, 2026

It's horrible that Elon's free range datasets aren't being protected from plebian examination. RFK should declare it a national health emergency and that not protecting it will give people autism. In the meantime, people should slather their electronic devices with ivermectin to protect them.

GitM · Mar 6, 2026

"Why isn't this guy buying my bullshit excuses?"

"He's not a multi-billionaire."

"But if he's poor, doesn't that mean he's stupid enough to believe me?"

shrug

Tonkaman · Mar 6, 2026

ChrisSD said:
I do suspect that AI companies don't actually know the source of all their training data, they just gobble up all the data they can. I also suspect that they like not knowing.

I mean, presumably AI companies don't choose to have CSAM in their training data. And yet here we are.

Presumably, but you never know. The owner of xAI is in the Epstein files after all.

aspire · Mar 6, 2026

I'm a little confused. I thought Musk was all about that AI techniques should be open to everyone so that no one has an advantage.

chalex · Mar 6, 2026

only xAI is mentioned in this article but what is the stance on this law from the other major labs ? IIRC they all have sus training sources except possibly Google/Deepmind

Statistical · Mar 6, 2026

If xAI can be ruined by a state passing a disclosure law then it seems kinda stupid to link SpaceX financial security to xAI doesn't it?

nick918 · Mar 6, 2026

Translation: Elon is worried about his liability for using sources he didn't pay for. I am sure someone warned him, and he said YOLO.

Zeppos · Mar 7, 2026

“It strains credulity to essentially suggest that no consumer is capable of making a useful evaluation of Plaintiff’s AI models by reviewing information about the datasets used to train them and that therefore there is no substantial government interest advanced by this disclosure statute,” Bernal wrote.

Some of us are stupid, a lot of us not. Hypothetical: Some will go hysterical because "Mein Kampf" somehow found its way in the pile of training data. Others understand that that is not the issue and will get upset because none of the works of Hanna Arendt are included.

It makes sense to publish the source data. It forms the core identity of the LLM. It determines its character.

Consumers can't interpret the extensive list? With a lot of effort, some of us probably can make sense of it, but most of us are not going to. Luckily there is Ars Technica. You bet they will go through it all and provide a digestible summary.

Musk... society is more than consumers. Try to fit in... a bit.

neffo · Mar 7, 2026

TylerH said:
Citation needed; I highly doubt that xAI is valuable.

They made something like $500m in revenue last year with margins so deeply negative they are on par with the externalities of this garbage.

EDIT: I was overly generous here. https://www.reuters.com/technology/...46-billion-bloomberg-news-reports-2026-01-09/

chaos215bar2 · Mar 7, 2026

MilanKraft said:
100% to all. And the fact that the Closet Nazi Ketamine Fiend has the gaul to speak for everyone and suggest "nobody cares" tells us everything we need to know about this arrogant, reckless fuck.

Especially when this is just so easily dismissed by taking 30 seconds to look up any of literally thousands of public online discussions on the topic. The whole argument just makes xAI sound like it's desperately trying to cover for something.

ImaGnu · Mar 7, 2026

This will give Musk an excuse for failure - he would have found out the ultimate truth about Everything, but woke California wrecked it by making him show that his Truth model turns out to be the comments section of Ars Technica. So now everyone can do it.

GCamomescro · Mar 7, 2026

Training datasets = ingredients in food, or components of a machine.

Murry Wilson · Mar 7, 2026

It's amazing that he's spent hundreds of thousands of dollars to make his face and "hair" look like THAT. That's as good as it gets.

Phew.

BCGeiger · Mar 7, 2026

If the training data sources are released, the world will see how much copyrighted material they have stolen. The very next day all of us who have had our data stolen will file suit. That is what Musk is afraid of.

ybeers · Mar 7, 2026

You hate to see it.

/s

Varste · Mar 7, 2026

Have other AI firms sued? OpenAI and Claude? If I read it right, the law requires them to disclose their info as well since they are accessible in the state. And I'd expect them to throw a hissy fit too, since they're all so damn secretive.

Musk fails to block California data disclosure law he fears will ruin xAI

Ars Legatus Legionis

Wise, Aged Ars Veteran

Ars Tribunus Militum

Ars Tribunus Militum

Ars Praetorian

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Centurion

Public “cannot possibly” care about AI training data​

Ars Scholae Palatinae

Ars Praefectus

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Legatus Legionis

Ars Praefectus

Ars Praefectus

Ars Centurion

Smack-Fu Master, in training

Wise, Aged Ars Veteran

Ars Tribunus Militum

Ars Tribunus Angusticlavius

Ars Tribunus Angusticlavius

Ars Legatus Legionis

Ars Centurion

Ars Praefectus

Ars Praetorian

Ars Praetorian

Ars Praefectus

Seniorius Lurkius

Ars Legatus Legionis

Ars Legatus Legionis

Smack-Fu Master, in training

Ars Tribunus Militum

Wise, Aged Ars Veteran

Ars Praefectus

Smack-Fu Master, in training

Ars Centurion

Ars Praetorian

Ars Praetorian

Ars Centurion

Ars Praetorian

Public “cannot possibly” care about AI training data