LLMs believe false statements even after explicit warnings that they’re false

JohnDeL · 2026-06-01T12:09:14-0400

SraCet said:
Do these articles talk about the times when a dolphin or bird counted the number of Rs in the word strawberry... ?

Not specifically, no. But they do make the point that birds and dolphins can count and thus could conceivably count the number of Rs in the word strawberry.

SraCet · 2026-06-01T12:13:47-0400

JohnDeL said:
Not specifically, no. But they do make the point that birds and dolphins can count and thus could conceivably count the number of Rs in the word strawberry.

And I'm sure there are instances of LLMs appearing to understand numbers, the concept of numerically "less than," etc. just as much as those observations of birds and dolphins.

I just gave ChatGPT a short sequence of comma-separated numbers an asked how many times the number 1 occurred in that sequence and it nailed it.

So I guess ChatGPT is intelligent, right?

SraCet · 2026-06-01T12:22:38-0400

crmarvin42 said:
...
What LLMs can do is impressive, but it does not understand what it is doing. Or else it would have picked the right citations for the facts and claims it asserted. That is the rub here. Many of the linked citations were wrong, but the fact or claim they were irroneously being used to support were actually correct! The LLM produced the right fact, but with the wrong supporting evidence for the fact.
...

Also, BTW, I'll point out that LLMs have no idea why they say the things they do.

It's useless to ask an LLM why it said any particular thing. It doesn't know. Asking is a rookie mistake.

If it volunteers a citation for something, that's sort of like it asked itself why it said a thing. There's no reason to expect such a citation/explanation to be correct.

That proves nothing about whether or not it understood something it said while it was staying it.

DeeplyUnconcerned · 2026-06-01T12:23:04-0400

SraCet said:
I'd like to see you explain/defend this assertion.

What is "statistical" about backpropagation?

By "training method", I mean the whole training process, not just backprop. But it doesn't matter to the argument; you can assert that LLM training involves no statistics whatsoever and just fall back on the fact that the output is expressed as probabilities, which gets you your "LLMs use probability more than expert systems do" endpoint without getting distracted.

JohnDeL · 2026-06-01T12:24:12-0400

SraCet said:
And I'm sure there are instances of LLMs appearing to understand numbers, the concept of numerically "less than," etc. just as much as those observations of birds and dolphins.

I just gave ChatGPT a short sequence of comma-separated numbers an asked how many times the number 1 occurred in that sequence and it nailed it.

So I guess ChatGPT is intelligent, right?

Nice shifting of the goalposts.

Your original statement was:

SraCet said:
And yet we ascribe intelligence to animals (birds, dolphins, etc.) that can obviously not count the numbers of letters in words, etc.

I have merely provided evidence that your knowledge of the field is lacking.

SraCet · 2026-06-01T12:28:19-0400

DeeplyUnconcerned said:
By "training method", I mean the whole training process, not just backprop. But it doesn't matter to the argument; you can assert that LLM training involves no statistics whatsoever and just fall back on the fact that the output is expressed as probabilities, which gets you your "LLMs use probability more than expert systems do" endpoint without getting distracted.

Actually, it's pretty easy to argue that they don't even generate probabilities.

They produce a vector of numbers that humans convert to a probability distribution via softmax.

But they are not actually probabilities any more than my phone number is a probability.

SraCet · 2026-06-01T12:29:57-0400

JohnDeL said:
Nice shifting of the goalposts.

Your original statement was:

I have merely provided evidence that your knowledge of the field is lacking.

Yeah, I said dolphins and birds can't count the number of letters in a word, and you provided no evidence to the contrary.

Stop polluting the comment thread with this nonsense. Either post an instance of a dolphin or a bird actually counting the instances of a particular letter in a particular word, or end this distraction.

wildsman · 2026-06-01T12:33:33-0400

crmarvin42 said:
The piece that jumped out as a benchmark, and proof of lack of understanding was not the content area stuff, but the 100% broken and worthless citations. Something that the training set has mountains of examples for. This is, arguably, one of the features of human writing with the most abundant use in the training set. The persuasive essay.

Literally any research it has ever been trained on uses citations. So do arguments on reddit, in message boards, on twitter, and any other situation where people try to convince each other of something using more than just "trust me bro" and, as you say "your opinion man".

What LLMs can do is impressive, but it does not understand what it is doing. Or else it would have picked the right citations for the facts and claims it asserted. That is the rub here. Many of the linked citations were wrong, but the fact or claim they were irroneously being used to support were actually correct! The LLM produced the right fact, but with the wrong supporting evidence for the fact.

That means that the LLM had access to the correct source for the information, but linked to a different source anyway. The most logical explanation for that event is that it does not understand. Not the information it is citing, or the purpose of citation.

This is where the scaffold comes in btw and user knowledge of how to leverage these tools. It is quite possible that in the future, these scaffolds will be built into the LLMs but for now the user needs to do this himself.

I have personally built tools that use LLMs in precisely the manner you're referring to and the citations work beautifully. If you want to try it out: https://www.openevidence.com/

DeeplyUnconcerned · 2026-06-01T12:40:27-0400

SraCet said:
Actually, it's pretty easy to argue that they don't even generate probabilities.

They produce a vector of numbers that humans convert to a probability distribution via softmax.

But they are not actually probabilities any more than my phone number is a probability.

If the framework doesn't treat them as a list of weights encoding relative token probabilities, the training process doesn't work.

(It also doesn't, by the same logic, say anything about tokens, it's just outputting an ordered list of numbers which is only converted to actual tokens by the framework. Only it doesn't even really do that, it just outputs a list of 1s and 0s which the framework interprets as a list. Only it doesn't even really do that, the model itself doesn't output anything, it's just an inert array of weights, it's only the framework that uses the weights to generate an output. Only it doesn't even really do that, the model is not inherently an array, it's just a cluster of 1s and 0s on the drive, which the framework loads into memory and treats as an array. Only it doesn't even really do that, there aren't really any 1s and 0s on the drive, they're just transient collections of electrons held in specialized pieces of silicon that the drive controller interprets as 1s and 0s. Only...

Or we could just treat an output that everyone knows is expressing token probabilities, as expressing token probabilities.)

SraCet · 2026-06-01T12:53:33-0400

DeeplyUnconcerned said:
If the framework doesn't treat them as a list of weights encoding relative token probabilities, the training process doesn't work.
...
Or we could just treat an output that everyone knows is expressing token probabilities, as expressing token probabilities.)

All the training process knows, for any particular instance of training data, is the correct token. Not the relative probabilities of all possible tokens.

And whether the outputs of the model are probabilities is a choice of interpretation. Instead of softmax, you could just as easily pick whatever token has the highest score, and then "probabilities" aren't involved anywhere in the process.

So what are we left with? You can't explain how backpropagation is "statistics" and even your idea that the model's outputs are "probabilities" is on shaky ground.

I'll be charitable and continue your argument for you. The only way "statistics" is involved in the whole process is that, after being trained on some data, the model is statistically more likely to reproduce the correct values when presented with the same exact training data.

But even that is kind of a useless understanding of the process.

Going back to my post about car repair, I could also say that, after adjusting the timing on my camshaft, my car is statistically less likely to pre-ignite. Does that mean car repair is "statistics"?

I wonder if you yourself have seen the claim that "LLMs are just statistics" one too many times and that has clouded your own understanding of how they work.

DeeplyUnconcerned · 2026-06-01T13:03:03-0400

SraCet said:
All the training process knows, for any particular instance of training data, is the correct token. Not the relative probabilities of all possible tokens.

And whether the outputs of the model are probabilities is a choice of interpretation. Instead of softmax, you could just as easily pick whatever token has the highest score, and then "probabilities" aren't involved anywhere in the process.

So what are we left with? You can't explain how backpropagation is "statistics" and even your idea that the model's outputs are "probabilities" is on shaky ground.

I'll be charitable and continue your argument for you. The only way "statistics" is involved in the whole process is that, after being trained on some data, the model is statistically more likely to reproduce the correct values when presented with the same exact training data.

But even that is kind of a useless understanding of the process.

Going back to my post about car repair, I could also say that, after adjusting the timing on my camshaft, my car is statistically less likely to pre-ignite. Does that mean car repair is "statistics"?

I wonder if you yourself have seen the claim that "LLMs are just statistics" one too many times and that has clouded your own understanding of how they work.

True or false: at or near the end of each forward pass (during training or inference), standard practice is to convert a list of token weights into a list of token probabilities as part of the process of selecting the next token.

wildsman · 2026-06-01T13:04:30-0400

DeeplyUnconcerned said:
If the framework doesn't treat them as a list of weights encoding relative token probabilities, the training process doesn't work.

(It also doesn't, by the same logic, say anything about tokens, it's just outputting an ordered list of numbers which is only converted to actual tokens by the framework. Only it doesn't even really do that, it just outputs a list of 1s and 0s which the framework interprets as a list. Only it doesn't even really do that, the model itself doesn't output anything, it's just an inert array of weights, it's only the framework that uses the weights to generate an output. Only it doesn't even really do that, the model is not inherently an array, it's just a cluster of 1s and 0s on the drive, which the framework loads into memory and treats as an array. Only it doesn't even really do that, there aren't really any 1s and 0s on the drive, they're just transient collections of electrons held in specialized pieces of silicon that the drive controller interprets as 1s and 0s. Only...

Or we could just treat an output that everyone knows is expressing token probabilities, as expressing token probabilities.)

You're making the point against your own argument and you don't realise. The entire point is that you can always break things down into its components.

So by saying 'it's just statistics' you've dismissed away the functional utility viewing a model at that layer of abstraction provides.

If you think the model can understand, you can then give it tasks like finding counterexamples to proofs - if you think it's just statistics, you're less likely to do that. "How can it find counterexamples across domains if it doesn't understand how many r's there are in a strawberry?"

I'm so glad the people doing real work don't think this naively.

DeeplyUnconcerned · 2026-06-01T13:19:06-0400

wildsman said:
You're making the point against your own argument and you don't realise. The entire point is that you can always break things down into its components.

So by saying 'it's just statistics' you've dismissed away the functional utility viewing a model at that layer of abstraction provides.

If you think the model can understand, you can then give it tasks like finding counterexamples to proofs - if you think it's just statistics, you're less likely to do that. "How can it find counterexamples across domains if it doesn't understand how many r's there are in a strawberry?"

I'm so glad the people doing real work don't think this naively.

That's only a loss if the net functional utility of treating a model as having understanding is positive, which is the issue in dispute.

wildsman · 2026-06-01T13:25:38-0400

DeeplyUnconcerned said:
That's only a loss if the net functional utility of treating a model as having understanding is positive, which is the issue in dispute.

No - I don't think anyone has been discussing 'net functional utility' of LLMs here. We have been discussing if 'understanding' is okay to use in the context of LLMs.

Heck, the other article with the Erdos proof should be more than enough to give you pause on that train of thought but who are we kidding right?

If you wanted to talk about 'net utility' - that's an entirely different question. And honestly, I might even agree with you that LLMs might be terrible for humanity.

SraCet · 2026-06-01T13:42:10-0400

DeeplyUnconcerned said:
True or false: at or near the end of each forward pass (during training or inference), standard practice is to convert a list of token weights into a list of token probabilities as part of the process of selecting the next token.

Standard practice is certainly to do a softmax to convert the output vector to a probability DISTRIBUTION.

If you want to call these "probabilities," there's an argument to be had even there.

Because I doubt that, if ChatGPT's model outputs a probability distribution value of 75% for a particular token, that means ChatGPT outputs that token exactly 75% of the time. I assume their process is more complicated than just randomly selecting from the softmax'ed output.

But even if the output is "probabilities," what does that have to do with statistics?

Certainly you could use statistics to determine the probability of something occurring, but that doesn't mean the thing you're observing is "statistics."

I could have a coin, and if I flip it, the odds of it landing heads is 50%. I could do a bunch of flips and use statistics to count the instances of heads and tails.

But does that mean that my coin is "statistics"? How does such a statement even make any sense?

SraCet · 2026-06-01T13:52:28-0400

wildsman said:
...
If you wanted to talk about 'net utility' - that's an entirely different question. And honestly, I might even agree with you that LLMs might be terrible for humanity.

Throw that in the bucket of nonsense that people say you're arguing but you aren't.

"Conscious," "self-aware," "alive," and now positive "net utility."

It's just strawmen all the way down.

crmarvin42 · 2026-06-01T13:57:18-0400

wildsman said:
This is where the scaffold comes in btw and user knowledge of how to leverage these tools. It is quite possible that in the future, these scaffolds will be built into the LLMs but for now the user needs to do this himself.

I have personally built tools that use LLMs in precisely the manner you're referring to and the citations work beautifully. If you want to try it out: https://www.openevidence.com/

Not gonna lie. That certainly feels and looks like a goal-post shift from here.

The LLM failed to demonstrate understanding at a fundamental level. That someone can work with the model to make it better at simulating the proper citation practices does not mean it has acquired understanding. Just that someone identified a task for which the simulation was failing to fool the audience, and worked to improve that aspect of the simulation. No different from adding ray-tracing to video games to replace static lighting effects of older generation games. The simulation got better, but it is still a simulation.

SraCet · 2026-06-01T14:00:39-0400

crmarvin42 said:
Not gonna lie. That certainly feels and looks like a goal-post shift from here.

The LLM failed to demonstrate understanding at a fundamental level. ...

And the justification for saying that hinges on the LLM being reliable re: explaining why it said a thing.

It's pretty funny that you would trust the output of an LLM to prove your point about the LLM being untrustworthy.

Voix des Airs · 2026-06-01T14:01:26-0400

crmarvin42 said:
SraCet is not worth engaging with. They use all the logical fallacies, and all the tricks to convince themself they are engaging with you, when in reality they are Gish galloping, strawmannirg, or some other thing to avoid addressing any of points you raise in opposition. They are, to be blunt, trolling everyone. Don't waste your time feeding the troll.

They created an argument, then argued that argument could not be made in good faith, then when it was pointed out the bad faith argument was theirs, they claimed it to be in good faith again.

Yeah. This whole discussion seems to have descended into an exercise in balloon squeezing (and he's not the only one doing it). I've stopped following this thread but I do want to make one last observation...

Given that I really don't know what a couple of posters here are even trying to say... knowing what I do about Tao, I found it really hard to accept that his position on such things would support what those posters seem to think (or want us to think) it does.

So I searched a little and what I came up with almost immediately were several interviews where he explains his use of AI... and it's mostly to help translate proofs from the way mathematicians write them into Lean (which is a proof verification language/tool) - which is a very time consuming and frankly annoying process. He also relates his experience that this sort of AI help is right/useful maybe 80-85% of the time - the rest being wrong in ways that look right, making the mistakes hard to find, but are in fact complete nonsense. He also makes specific note that AI is incapable of detecting contradictions in what it's doing. He analogizes his use of AI in math to the development of calculators or LaTeX. He is explicit in that AI cannot replace mathematicians and that although there is no way to know if/when a breakthrough might (or might not) happen he has a hard time seeing what we have now ever being able to do so.

So if you want to argue about this, have at it. But unless you are going to provide and stick to some concrete, formal definitions of what it is you are actually trying to say (e.g. what you mean by "understand"), I don't don't like this sort of discussion so it will have to be with someone else.

DeeplyUnconcerned · 2026-06-01T14:02:18-0400

wildsman said:
No - I don't think anyone has been discussing 'net functional utility' of LLMs here. We have been discussing if 'understanding' is okay to use in the context of LLMs.

Heck, the other article with the Erdos proof should be more than enough to give you pause on that train of thought but who are we kidding right?

If you wanted to talk about 'net utility' - that's an entirely different question. And honestly, I might even agree with you that LLMs might be terrible for humanity.

OK, replace the second "is" with "relies upon" and re-evaluate.

arsisloam · 2026-06-01T14:05:04-0400

SraCet said:
Throw that in the bucket of nonsense that people say you're arguing but you aren't.

"Conscious," "self-aware," "alive," and now positive "net utility."

It's just strawmen all the way down.

This is nobody's first time around the shed with wildsman. I used to think they were reasonable too, and their core point that LLMs are a kind of simulation of thinking has merit. But wildsman is way out there. They think current LLMs, as they are now, are living, thinking entities with agency.

SraCet · 2026-06-01T14:07:34-0400

Voix des Airs said:
...
So if you want to argue about this, have at it. But unless you are going to provide and stick to concrete, formal definitions of what it is you are actually trying to say (e.g. what you mean by "understand"), I don't don't like this sort of discussion so it will have to be with someone else.

I haven't read every single word of every single post but so far it seems like wildsman is the only person here who's trying to pin anybody down to a concrete, formal definition of what "understanding" means.

Did you post a definition of "understanding"?

SraCet · 2026-06-01T14:08:33-0400

arsisloam said:
This is nobody's first time around the shed with wildsman. I used to think they were reasonable too, and their core point that LLMs are a kind of simulation of thinking has merit. But wildsman is way out there. They think current LLMs, as they are now, are living, thinking entities with agency.

Uhh, okay. But so far I haven't seen that on this thread. How about keeping the discussion on this thread limited to... well, what's being discussed on this thread.

DeeplyUnconcerned · 2026-06-01T14:09:57-0400

SraCet said:
Standard practice is certainly to do a softmax to convert the output vector to a probability DISTRIBUTION.

If you want to call these "probabilities," there's an argument to be had even there.

Because I doubt that, if ChatGPT's model outputs a probability distribution value of 75% for a particular token, that means ChatGPT outputs that token exactly 75% of the time. I assume their process is more complicated than just randomly selecting from the softmax'ed output.

But even if the output is "probabilities," what does that have to do with statistics?

Certainly you could use statistics to determine the probability of something occurring, but that doesn't mean the thing you're observing is "statistics."

I could have a coin, and if I flip it, the odds of it landing heads is 50%. I could do a bunch of flips and use statistics to count the instances of heads and tails.

But does that mean that my coin is "statistics"? How does such a statement even make any sense?

What does "probabilities" have to do with "statistics"?

Is that a serious question?

OK, let's assume that, somehow, you are actually asking that seriously. Then: an LLM is a model which functionally takes a string of text as its input and expresses its prediction about what should come next as a list of probabilities, because an LLM is ultimately a statistical model of language (specifically, of the language in its training data).

I agree that you can make arguments about all these things, I just think they're... not good arguments. And, statistically, indicative of someone who's deep in a hole and still digging.

SraCet · 2026-06-01T14:32:14-0400

DeeplyUnconcerned said:
What does "probabilities" have to do with "statistics"?

Is that a serious question?

OK, let's assume that, somehow, you are actually asking that seriously. Then: an LLM is a model which functionally takes a string of text as its input and expresses its prediction about what should come next as a list of probabilities, because an LLM is ultimately a statistical model of language (specifically, of the language in its training data).

I agree that you can make arguments about all these things, I just think they're... not good arguments. And, statistically, indicative of someone who's deep in a hole and still digging. COAFB.

Yes, yes, I will "admit" that it's theoretically possible to view an LLM, conceptually, as a statistical model, if you're plotting input vectors into nearly-infinite-dimensional space and relating them to clusters created by the training data or whatever.

But this is just as stupid and useless as thinking of an LLM as a lookup table that's infinitely big.

Nothing about an LLM is implemented as "statistics." Nothing about the training process of an LLM is "statistics." None of the weights in the LLM are "statistics." There is nothing stored inside an LLM that signifies "the word 'am' comes after the word 'I' 20% of the time" or whatever most people think of when they think of statistics.

The idea of considering an LLM to be statistics is so far removed from what a human being could comprehend as statistics that it's counterproductive to even think about.

We might as well say that a human brain is a statistical model of the external world, for all the usefulness that provides.

What is the point when somebody on these forums says that "LLMs are statistics"? Is the point to help other people better understand how LLMs work? No, because it's not how they work. (See above.) The point is to imply that LLMs are nothing more than scaled-up versions of Bayesian inference chatbots, or n-gram completion chatbots. Which is stupid and wrong.

DeeplyUnconcerned · 2026-06-01T14:35:09-0400

SraCet said:
Yes, yes, I will "admit" that it's theoretically possible to view an LLM, conceptually, as a statistical model, if you're plotting input vectors into nearly-infinite-dimensional space and relating them to clusters created by the training data or whatever.

But this is just as stupid and useless as thinking of an LLM as a lookup table that's infinitely big.

Nothing about an LLM is implemented as "statistics." Nothing about the training process of an LLM is "statistics." None of the weights in the LLM are "statistics." There is nothing stored inside an LLM that signifies "the word 'am' comes after the word 'I' 20% of the time" or whatever most people think of when they think of statistics.

The idea of considering an LLM to be statistics is so far removed from what a human being could comprehend as statistics that it's counterproductive to even think about.

We might as well say that a human brain is a statistical model of the external world, for all the usefulness that provides.

What is the point when somebody on these forums says that "LLMs are statistics"? Is the point to help other people better understand how LLMs work? No, because it's not how they work. (See above.) The point is to imply that LLMs are nothing more than scaled-up versions of Bayesian inference chatbots, or n-gram completion chatbots. Which is stupid and wrong.

Alright, and will you also "admit" that that first paragraph is not true of at least some other approaches to AI? And therefore that, to some non-zero degree, LLMs are more about statistics than other forms of AI?

wildsman · 2026-06-01T14:36:37-0400

crmarvin42 said:
Not gonna lie. That certainly feels and looks like a goal-post shift from here.

The LLM failed to demonstrate understanding at a fundamental level. That someone can work with the model to make it better at simulating the proper citation practices does not mean it has acquired understanding.

Yet again with the binary - there is no such thing as 'acquired understanding'.

Current LLMs aren't perfect. No one is pretending they are.

But to say that they don't have any understanding of the context/training data is to fundamentally misunderstand both LLMs and what it means to 'understand'.

SraCet · 2026-06-01T14:37:41-0400

DeeplyUnconcerned said:
Alright, and will you also "admit" that that first paragraph is not true of at least some other approaches to AI? And therefore that, to some non-zero degree, LLMs are more about statistics than other forms of AI?

I already asked you this earlier, but if I say yes, what will your point be?

You didn't answer me before, so, sure.

Hypothetically: LLMs are "more about statistics" than certain other forms of AI.

What does that even imply about anything being discussed?

SraCet · 2026-06-01T14:57:25-0400

DeeplyUnconcerned said:
Alright, and will you also "admit" that that first paragraph is not true of at least some other approaches to AI? And therefore that, to some non-zero degree, LLMs are more about statistics than other forms of AI?

Oh, also, I'd like to point something out.

You know what things do have lookup tables? And statistics? And work by looking up statistics in lookup tables?

Bayesian inference chatbots and n-gram completing chatbots.

Saying that those things are lookup tables and have (or are based on) statistics is an excellent way to describe them and understand them.

None of this garbage about contorting your brain to think of an LLM, conceptually, theoretically, as an infinitely large lookup table or a statistical model in infinite-dimensional-space.

Comparing how well "lookup table/statistical model" explains n-gram chatbots vs. LLMs should pretty well sum up my complaint here.

Edit: Also, Bayesian inference chatbots actually do calculate actual probabilities. No mental gymnastics necessary.

DeeplyUnconcerned · 2026-06-01T15:19:29-0400

SraCet said:
I already asked you this earlier, but if I say yes, what will your point be?

You didn't answer me before, so, sure.

Hypothetically: LLMs are "more about statistics" than certain other forms of AI.

What does that even imply about anything being discussed?

From my POV, I think this topic of conversation goes back to here:

DeeplyUnconcerned said:
LLMs (the model, not the framework) encode a statistical model of their training data in the form of a large array of weights, in contrast to say expert systems, which encode discrete rules. LLMs are literally probabilistic, in that their outputs are a list of probabilities. They are self-evidently way more statistically-based than most other forms of AI we’ve tried, and plausibly much more statistically-based than biological intelligence. Statisticality is one of the distinguishing features of this whole approach.

It seems we're now in agreement on that. So, cool. Glad we had the last three pages of back-and-forth.

DeeplyUnconcerned · 2026-06-01T15:19:43-0400

Postscript: "LLMs are just statistics" is not my argument, and while I understand where it's coming from, I think it moderately undersells their complexity. I would though argue pretty strongly that framing LLMs as lossy compression of the total information (in the information-theoretic sense) in their training data by way of encoding concepts into vectors in very-high-dimensional space is a very useful way to mentally model their operation, both in that it allows for a manipulable model that matches their architecture and scale and many key empirical results, and in that it suggests some interesting avenues of further investigation (e.g. investigating to what degree output is related to/driven by best-fit lines in that space). I can understand how that might not be useful for everyone, but I have found it useful myself so dismissing it entirely seems unwarranted.

SraCet · 2026-06-01T15:25:35-0400

DeeplyUnconcerned said:
From my POV, I think this topic of conversation goes back to here:

It seems we're now in agreement on that. So, cool. Glad we had the last three pages of back-and-forth.

I mean, not really.

Did you read my post #509?

If you compare an LLM to an n-gram-completing chatbot, the latter of which contains actual tables of things that are readily identified by any human as "statistics," the difference is night and day.

If you have to do mental gymnastics to see how statistics are only just theoretically related to LLMs, I don't see how anybody would call that a "distinguishing feature."

SraCet · 2026-06-01T15:30:41-0400

DeeplyUnconcerned said:
Postscript: "LLMs are just statistics" is not my argument, and while I understand where it's coming from, I think it moderately undersells their complexity. I would though argue pretty strongly that framing LLMs as lossy compression of the total information (in the information-theoretic sense) in their training data by way of encoding concepts into vectors in very-high-dimensional space is a very useful way to mentally model their operation, both in that it allows for a manipulable model that matches their architecture and scale and many key empirical results, and in that it suggests some interesting avenues of further investigation (e.g. investigating to what degree output is related to/driven by best-fit lines in that space). I can understand how that might not be useful for everyone, but I have found it useful myself so dismissing it entirely seems unwarranted.

Another equally valid way of looking at LLMs is that they're functions that approximate intelligence.

They're a representation of the entire space of intelligent things that could be said in any possible situation.

Of course, we don't have a corpus of training data that only includes intelligent stuff, so we train them with all the text that we do have and hope for the best.

But, intelligence is what we've been aiming at.

graylshaped · 2026-06-01T15:34:42-0400

wildsman said:
Do you claim humans have intelligence/understanding?

You are really, really struggling with the reality that I am not the one making a claim here.

JohnDeL · 2026-06-01T15:37:47-0400

SraCet said:
Because I doubt that, if ChatGPT's model outputs a probability distribution value of 75% for a particular token, that means ChatGPT outputs that token exactly 75% of the time. I assume their process is more complicated than just randomly selecting from the softmax'ed output.

If ChatGPT's model gives a probability distribution value of 75% for a particular token in a given set of circumstances and that token doesn't show up 75% of the time in those circumstances, then that means that ChatGPT is borked. Because a PDV of 75% literally means that the event should happen 75% of the time given a specific set of circumstances.

SraCet · 2026-06-01T15:38:00-0400

graylshaped said:
You are really, really struggling with the reality that I am not the one making a claim here.

Your text from post #439:

As long as we're waving degrees around, my undergraduate degree was in semiotics, and as one of those whose motives in expressing our point of view you call into question in your post, can tell you that "LLMs simply could not be as good as they are at language without at least close analogues of 'concepts' and 'understanding' " says more about your own superficial understanding of the relationships between sign and signifier and how that affects language and communication than it does about the capabilities of LLMs.

Saying that somebody is wrong about something seems like just as much of a claim as anything else?

graylshaped · 2026-06-01T15:39:36-0400

SraCet said:
Making a word italic doesn't mean you're making an argument.

wildsman has been trying to get you guys to define "understanding" in a practical, useful way for pages of comments and all you can come up with is more and more emphatic assertions that you're right.

Now you're going to have to define "useful."

It's his claim, and insisting we must prove the negative or accept his claim is about as lamely tendentious as I have seen you stoop, and boy, have you stooped tendentiously over the years.

SraCet · 2026-06-01T15:40:09-0400

JohnDeL said:
If ChatGPT's model gives a probability distribution value of 75% for a particular token in a given set of circumstances and that token doesn't show up 75% of the time in those circumstances, then that means that ChatGPT is borked. Because a PDV of 75% literally means that the event should happen 75% of the time given a specific set of circumstances.

Have you done a lot of training of DNNs? These probability distributions they give as output are pretty fluid and often don't relate very well, statistically, to the training data.

Training is a process.

graylshaped · 2026-06-01T15:50:04-0400

SraCet said:
Only indirectly, if at all.

Nothing in an LLM is counting the number of occurrences of words, phrases, word correlations, etc. in training data as it's being used to train an LLM.

Who said that matters, true or otherwise? I'm fascinated watching the strange maneuvering you are doing to pretend these models are not affected by probability and statistical methods.

JohnDeL · 2026-06-01T15:52:59-0400

SraCet said:
Have you done a lot of training of DNNs? These probability distributions they give as output are pretty fluid and often don't relate very well, statistically, to the training data.

Training is a process.

Yes, I have. I even wrote a couple of papers on how to integrate their results with other methods to give more realistic geological simulations.

If your probability distribution doesn't relate to the actual distribution, then you've fucked up. That is axiomatic to statistics.

LLMs believe false statements even after explicit warnings that they’re false

Ars Tribunus Angusticlavius

Ars Legatus Legionis

Ars Legatus Legionis

Ars Scholae Palatinae

Ars Tribunus Angusticlavius

Ars Legatus Legionis

Ars Legatus Legionis

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Legatus Legionis

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Legatus Legionis

Ars Legatus Legionis

Ars Praefectus

Ars Legatus Legionis

Ars Praefectus

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Legatus Legionis

Ars Legatus Legionis

Ars Scholae Palatinae

Ars Legatus Legionis

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Legatus Legionis

Ars Legatus Legionis

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Tribunus Angusticlavius

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Tribunus Angusticlavius