Musk fails to block California data disclosure law he fears will ruin xAI

Fatesrider

Ars Legatus Legionis
24,977
Subscriptor
Hmmm, lets shave with Occam's Razor:

Does Elon want to keep xAI's data sources secret because they are so much better at finding training materials than other AI companies?

Or does Elon want to keep xAI's data sources secret because many of them are copyright infringing or illegal, like the DOGE Social Security dataset that got copied by his henchmen?
Oh, it's decidedly this. No question.

People in AI who do things legally don't CARE about this stuff:
xAI had tried to argue that California’s Assembly Bill 2013 (AB 2013) forced AI firms to disclose carefully guarded trade secrets.
because training data isn't a trade secret.
 
Upvote
36 (36 / 0)

Delerious

Ars Praetorian
599
Subscriptor++
If Aaron Swartz was potentially liable (he took his own life due to the savagery of the prosecution) for collecting and making available he content of PACER available, then I'd argue corporations should be held to an equal or higher standard and be transparent for the data sources.

At least Aaron wasn't looking for monetary profit for what he did. I don't see Elon Musk or a number of the other "AI bros" working "for the good of humanity".
Sounds similar to Court Listener by Free Law.
 
Upvote
3 (3 / 0)

WereCatf

Ars Tribunus Militum
2,830
“It is not lost on the Court the important role of datasets in AI training and development, and that, hypothetically, datasets and details about them could be trade secrets,” Bernal wrote. But xAI “has not alleged that it actually uses datasets that are unique, that it has meaningfully larger or smaller datasets than competitors, or that it cleans its datasets in unique ways.”
They certainly are "unique" in one way or another. I mean, I haven't seen any other AI-model claim to be a Mechahitler....
 
Upvote
28 (28 / 0)

bushrat011899

Ars Scholae Palatinae
658
Subscriptor
I mean, presumably AI companies don't choose to have CSAM in their training data. And yet here we are.
Unfortunately, the AI companies do want that in their training data. The only way you can train an AI to not make something is to show it what it's not allowed to make. Throw in that some of these companies offer their AI as content moderation services, they absolutely have CSAM neatly categorized and thoroughly documented by some poor souls in a 3rd world country inadequately paid for the horrors they're exposed to.
 
Upvote
16 (17 / -1)

ranthog

Ars Legatus Legionis
15,240
Is the secret sauce of AI the data sources or the actual programming? I figure it'd be useful to have your sources that aren't well known to give a model an edge, but ultimately the training should be more important than the dataset all things being equal. I get that this is the wild west starting point and tech companies have no morals if it makes them money, but eventually I would think that there'd be some kind of standard or regulation of what training data can be used.
I would strongly suspect that no one has special and significant data sets that other companies are not aware of. They have way too much money and resources to not have scoured everything for data.
 
Upvote
17 (17 / 0)

momoisdabest

Smack-Fu Master, in training
35
Sadly, because it's Elon Musk saying it we can be sure it's false.


So "We have no special sauce at all," basically. At this point it's safe to assume that essentially all the model makers in play have access to all the same training data (e.g., everything ever put on the Internet).
In fact it's worse than that. If Musk is alleging that only the proportional contents of his training data set the company apart from its rivals and that this law will not harm its rivals in the same way it would harm his company, that points to a unique vulnerability in their approach.
Yeah the models aren't that different for the general consumer's use case, maybe some are more fine tuned for specific benchmarking tests or whatever. But they all write with stupid sycophantic crap and emojis
 
Upvote
2 (2 / 0)

SubWoofer2

Ars Tribunus Militum
2,550
If I engage with a chatbot to learn about German history I'd like to know it's been trained on something other than Mein Kampf.

Likewise if I was seeking advice on how to engage with any human being, e.g. a memo to my boss, going shopping with my partner, commenting on a web posting. Knowing that the data is sourced from the chans would be ... quite relevant.
 
Upvote
8 (8 / 0)

MilanKraft

Ars Tribunus Angusticlavius
6,711
Good. IMHO, anything that thwarts Musk's insanity is a good thing.

From a practical policy discussion standpoint, I'm of the opinion that all AI developers should be forced to divulge where their training data comes from. Rampant intellectual property theft was what allowed the rapid development of LLMs. IP owners deserve the transparency to determine for themselves whether they want to exercise their rights.
100% to all. And the fact that the Closet Nazi Ketamine Fiend has the gaul to speak for everyone and suggest "nobody cares" tells us everything we need to know about this arrogant, reckless fuck.
 
Upvote
23 (23 / 0)

s73v3r

Ars Legatus Legionis
25,618
I've wondered this about the various EU lawsuits as well. I don't see how CA or the EU can prevent people from using Grok, and having the actual data centers somewhere else is almost certainly cheaper. Can they prevent people from downloading xAI apps or something?
In that case, xAI would be considered to be doing business in those jurisdictions, and thus would be subject to those regulations
 
Upvote
9 (9 / 0)
Musk is arguing that these “trade secrets” give xAI a competitive edge, which it would lose if forced to divulge the info.

What competitive edge? My understanding is that Grok sucks it big time. Musk should be in favor of this law, and try to force OpenAI and others to comply as well, so he can steal their “secrets.”
 
Upvote
16 (16 / 0)

Random_stranger

Ars Praefectus
5,209
Subscriptor
100% to all. And the fact that the Closet Nazi Ketamine Fiend has the gaul to speak for everyone and suggest "nobody cares" tells us everything we need to know about this arrogant, reckless fuck.

He hired a PR group from France? (I think you meant gall, as in gallbladder)
 
Upvote
19 (19 / 0)
I do suspect that AI companies don't actually know the source of all their training data, they just gobble up all the data they can. I also suspect that they like not knowing.

I mean, presumably AI companies don't choose to have CSAM in their training data. And yet here we are.
Presumably, but you never know. The owner of xAI is in the Epstein files after all.
 
Upvote
18 (18 / 0)

Zeppos

Ars Tribunus Militum
2,864
Subscriptor
“It strains credulity to essentially suggest that no consumer is capable of making a useful evaluation of Plaintiff’s AI models by reviewing information about the datasets used to train them and that therefore there is no substantial government interest advanced by this disclosure statute,” Bernal wrote.
Some of us are stupid, a lot of us not. Hypothetical: Some will go hysterical because "Mein Kampf" somehow found its way in the pile of training data. Others understand that that is not the issue and will get upset because none of the works of Hanna Arendt are included.

It makes sense to publish the source data. It forms the core identity of the LLM. It determines its character.

Consumers can't interpret the extensive list? With a lot of effort, some of us probably can make sense of it, but most of us are not going to. Luckily there is Ars Technica. You bet they will go through it all and provide a digestible summary.

Musk... society is more than consumers. Try to fit in... a bit.
 
Upvote
8 (8 / 0)

neffo

Wise, Aged Ars Veteran
197
Upvote
15 (15 / 0)
100% to all. And the fact that the Closet Nazi Ketamine Fiend has the gaul to speak for everyone and suggest "nobody cares" tells us everything we need to know about this arrogant, reckless fuck.
Especially when this is just so easily dismissed by taking 30 seconds to look up any of literally thousands of public online discussions on the topic. The whole argument just makes xAI sound like it's desperately trying to cover for something.
 
Upvote
5 (5 / 0)
It's amazing that he's spent hundreds of thousands of dollars to make his face and "hair" look like THAT. That's as good as it gets.

1772889990197.png


Phew.
 
Upvote
6 (6 / 0)