Fine-tuning tests show "bias ... toward confidently representing the claims as true."
See full article...
See full article...
Not specifically, no. But they do make the point that birds and dolphins can count and thus could conceivably count the number of Rs in the word strawberry.Do these articles talk about the times when a dolphin or bird counted the number of Rs in the word strawberry... ?
And I'm sure there are instances of LLMs appearing to understand numbers, the concept of numerically "less than," etc. just as much as those observations of birds and dolphins.Not specifically, no. But they do make the point that birds and dolphins can count and thus could conceivably count the number of Rs in the word strawberry.
Also, BTW, I'll point out that LLMs have no idea why they say the things they do....
What LLMs can do is impressive, but it does not understand what it is doing. Or else it would have picked the right citations for the facts and claims it asserted. That is the rub here. Many of the linked citations were wrong, but the fact or claim they were irroneously being used to support were actually correct! The LLM produced the right fact, but with the wrong supporting evidence for the fact.
...
By "training method", I mean the whole training process, not just backprop. But it doesn't matter to the argument; you can assert that LLM training involves no statistics whatsoever and just fall back on the fact that the output is expressed as probabilities, which gets you your "LLMs use probability more than expert systems do" endpoint without getting distracted.I'd like to see you explain/defend this assertion.
What is "statistical" about backpropagation?
Nice shifting of the goalposts.And I'm sure there are instances of LLMs appearing to understand numbers, the concept of numerically "less than," etc. just as much as those observations of birds and dolphins.
I just gave ChatGPT a short sequence of comma-separated numbers an asked how many times the number 1 occurred in that sequence and it nailed it.
So I guess ChatGPT is intelligent, right?
And yet we ascribe intelligence to animals (birds, dolphins, etc.) that can obviously not count the numbers of letters in words, etc.
Actually, it's pretty easy to argue that they don't even generate probabilities.By "training method", I mean the whole training process, not just backprop. But it doesn't matter to the argument; you can assert that LLM training involves no statistics whatsoever and just fall back on the fact that the output is expressed as probabilities, which gets you your "LLMs use probability more than expert systems do" endpoint without getting distracted.
Yeah, I said dolphins and birds can't count the number of letters in a word, and you provided no evidence to the contrary.Nice shifting of the goalposts.
Your original statement was:
I have merely provided evidence that your knowledge of the field is lacking.
This is where the scaffold comes in btw and user knowledge of how to leverage these tools. It is quite possible that in the future, these scaffolds will be built into the LLMs but for now the user needs to do this himself.The piece that jumped out as a benchmark, and proof of lack of understanding was not the content area stuff, but the 100% broken and worthless citations. Something that the training set has mountains of examples for. This is, arguably, one of the features of human writing with the most abundant use in the training set. The persuasive essay.
Literally any research it has ever been trained on uses citations. So do arguments on reddit, in message boards, on twitter, and any other situation where people try to convince each other of something using more than just "trust me bro" and, as you say "your opinion man".
What LLMs can do is impressive, but it does not understand what it is doing. Or else it would have picked the right citations for the facts and claims it asserted. That is the rub here. Many of the linked citations were wrong, but the fact or claim they were irroneously being used to support were actually correct! The LLM produced the right fact, but with the wrong supporting evidence for the fact.
That means that the LLM had access to the correct source for the information, but linked to a different source anyway. The most logical explanation for that event is that it does not understand. Not the information it is citing, or the purpose of citation.
If the framework doesn't treat them as a list of weights encoding relative token probabilities, the training process doesn't work.Actually, it's pretty easy to argue that they don't even generate probabilities.
They produce a vector of numbers that humans convert to a probability distribution via softmax.
But they are not actually probabilities any more than my phone number is a probability.
All the training process knows, for any particular instance of training data, is the correct token. Not the relative probabilities of all possible tokens.If the framework doesn't treat them as a list of weights encoding relative token probabilities, the training process doesn't work.
...
Or we could just treat an output that everyone knows is expressing token probabilities, as expressing token probabilities.)
True or false: at or near the end of each forward pass (during training or inference), standard practice is to convert a list of token weights into a list of token probabilities as part of the process of selecting the next token.All the training process knows, for any particular instance of training data, is the correct token. Not the relative probabilities of all possible tokens.
And whether the outputs of the model are probabilities is a choice of interpretation. Instead of softmax, you could just as easily pick whatever token has the highest score, and then "probabilities" aren't involved anywhere in the process.
So what are we left with? You can't explain how backpropagation is "statistics" and even your idea that the model's outputs are "probabilities" is on shaky ground.
I'll be charitable and continue your argument for you. The only way "statistics" is involved in the whole process is that, after being trained on some data, the model is statistically more likely to reproduce the correct values when presented with the same exact training data.
But even that is kind of a useless understanding of the process.
Going back to my post about car repair, I could also say that, after adjusting the timing on my camshaft, my car is statistically less likely to pre-ignite. Does that mean car repair is "statistics"?
I wonder if you yourself have seen the claim that "LLMs are just statistics" one too many times and that has clouded your own understanding of how they work.
You're making the point against your own argument and you don't realise. The entire point is that you can always break things down into its components.If the framework doesn't treat them as a list of weights encoding relative token probabilities, the training process doesn't work.
(It also doesn't, by the same logic, say anything about tokens, it's just outputting an ordered list of numbers which is only converted to actual tokens by the framework. Only it doesn't even really do that, it just outputs a list of 1s and 0s which the framework interprets as a list. Only it doesn't even really do that, the model itself doesn't output anything, it's just an inert array of weights, it's only the framework that uses the weights to generate an output. Only it doesn't even really do that, the model is not inherently an array, it's just a cluster of 1s and 0s on the drive, which the framework loads into memory and treats as an array. Only it doesn't even really do that, there aren't really any 1s and 0s on the drive, they're just transient collections of electrons held in specialized pieces of silicon that the drive controller interprets as 1s and 0s. Only...
Or we could just treat an output that everyone knows is expressing token probabilities, as expressing token probabilities.)
That's only a loss if the net functional utility of treating a model as having understanding is positive, which is the issue in dispute.You're making the point against your own argument and you don't realise. The entire point is that you can always break things down into its components.
So by saying 'it's just statistics' you've dismissed away the functional utility viewing a model at that layer of abstraction provides.
If you think the model can understand, you can then give it tasks like finding counterexamples to proofs - if you think it's just statistics, you're less likely to do that. "How can it find counterexamples across domains if it doesn't understand how many r's there are in a strawberry?"
I'm so glad the people doing real work don't think this naively.
No - I don't think anyone has been discussing 'net functional utility' of LLMs here. We have been discussing if 'understanding' is okay to use in the context of LLMs.That's only a loss if the net functional utility of treating a model as having understanding is positive, which is the issue in dispute.
Standard practice is certainly to do a softmax to convert the output vector to a probability DISTRIBUTION.True or false: at or near the end of each forward pass (during training or inference), standard practice is to convert a list of token weights into a list of token probabilities as part of the process of selecting the next token.
Throw that in the bucket of nonsense that people say you're arguing but you aren't....
If you wanted to talk about 'net utility' - that's an entirely different question. And honestly, I might even agree with you that LLMs might be terrible for humanity.
Not gonna lie. That certainly feels and looks like a goal-post shift from here.This is where the scaffold comes in btw and user knowledge of how to leverage these tools. It is quite possible that in the future, these scaffolds will be built into the LLMs but for now the user needs to do this himself.
I have personally built tools that use LLMs in precisely the manner you're referring to and the citations work beautifully. If you want to try it out: https://www.openevidence.com/
And the justification for saying that hinges on the LLM being reliable re: explaining why it said a thing.Not gonna lie. That certainly feels and looks like a goal-post shift from here.
The LLM failed to demonstrate understanding at a fundamental level. ...
SraCet is not worth engaging with. They use all the logical fallacies, and all the tricks to convince themself they are engaging with you, when in reality they are Gish galloping, strawmannirg, or some other thing to avoid addressing any of points you raise in opposition. They are, to be blunt, trolling everyone. Don't waste your time feeding the troll.
They created an argument, then argued that argument could not be made in good faith, then when it was pointed out the bad faith argument was theirs, they claimed it to be in good faith again.
OK, replace the second "is" with "relies upon" and re-evaluate.No - I don't think anyone has been discussing 'net functional utility' of LLMs here. We have been discussing if 'understanding' is okay to use in the context of LLMs.
Heck, the other article with the Erdos proof should be more than enough to give you pause on that train of thought but who are we kidding right?
If you wanted to talk about 'net utility' - that's an entirely different question. And honestly, I might even agree with you that LLMs might be terrible for humanity.
This is nobody's first time around the shed with wildsman. I used to think they were reasonable too, and their core point that LLMs are a kind of simulation of thinking has merit. But wildsman is way out there. They think current LLMs, as they are now, are living, thinking entities with agency.Throw that in the bucket of nonsense that people say you're arguing but you aren't.
"Conscious," "self-aware," "alive," and now positive "net utility."
It's just strawmen all the way down.
I haven't read every single word of every single post but so far it seems like wildsman is the only person here who's trying to pin anybody down to a concrete, formal definition of what "understanding" means....
So if you want to argue about this, have at it. But unless you are going to provide and stick to concrete, formal definitions of what it is you are actually trying to say (e.g. what you mean by "understand"), I don't don't like this sort of discussion so it will have to be with someone else.
Uhh, okay. But so far I haven't seen that on this thread. How about keeping the discussion on this thread limited to... well, what's being discussed on this thread.This is nobody's first time around the shed with wildsman. I used to think they were reasonable too, and their core point that LLMs are a kind of simulation of thinking has merit. But wildsman is way out there. They think current LLMs, as they are now, are living, thinking entities with agency.
What does "probabilities" have to do with "statistics"?Standard practice is certainly to do a softmax to convert the output vector to a probability DISTRIBUTION.
If you want to call these "probabilities," there's an argument to be had even there.
Because I doubt that, if ChatGPT's model outputs a probability distribution value of 75% for a particular token, that means ChatGPT outputs that token exactly 75% of the time. I assume their process is more complicated than just randomly selecting from the softmax'ed output.
But even if the output is "probabilities," what does that have to do with statistics?
Certainly you could use statistics to determine the probability of something occurring, but that doesn't mean the thing you're observing is "statistics."
I could have a coin, and if I flip it, the odds of it landing heads is 50%. I could do a bunch of flips and use statistics to count the instances of heads and tails.
But does that mean that my coin is "statistics"? How does such a statement even make any sense?
Yes, yes, I will "admit" that it's theoretically possible to view an LLM, conceptually, as a statistical model, if you're plotting input vectors into nearly-infinite-dimensional space and relating them to clusters created by the training data or whatever.What does "probabilities" have to do with "statistics"?
Is that a serious question?
OK, let's assume that, somehow, you are actually asking that seriously. Then: an LLM is a model which functionally takes a string of text as its input and expresses its prediction about what should come next as a list of probabilities, because an LLM is ultimately a statistical model of language (specifically, of the language in its training data).
I agree that you can make arguments about all these things, I just think they're... not good arguments. And, statistically, indicative of someone who's deep in a hole and still digging. COAFB.
Alright, and will you also "admit" that that first paragraph is not true of at least some other approaches to AI? And therefore that, to some non-zero degree, LLMs are more about statistics than other forms of AI?Yes, yes, I will "admit" that it's theoretically possible to view an LLM, conceptually, as a statistical model, if you're plotting input vectors into nearly-infinite-dimensional space and relating them to clusters created by the training data or whatever.
But this is just as stupid and useless as thinking of an LLM as a lookup table that's infinitely big.
Nothing about an LLM is implemented as "statistics." Nothing about the training process of an LLM is "statistics." None of the weights in the LLM are "statistics." There is nothing stored inside an LLM that signifies "the word 'am' comes after the word 'I' 20% of the time" or whatever most people think of when they think of statistics.
The idea of considering an LLM to be statistics is so far removed from what a human being could comprehend as statistics that it's counterproductive to even think about.
We might as well say that a human brain is a statistical model of the external world, for all the usefulness that provides.
What is the point when somebody on these forums says that "LLMs are statistics"? Is the point to help other people better understand how LLMs work? No, because it's not how they work. (See above.) The point is to imply that LLMs are nothing more than scaled-up versions of Bayesian inference chatbots, or n-gram completion chatbots. Which is stupid and wrong.
Yet again with the binary - there is no such thing as 'acquired understanding'.Not gonna lie. That certainly feels and looks like a goal-post shift from here.
The LLM failed to demonstrate understanding at a fundamental level. That someone can work with the model to make it better at simulating the proper citation practices does not mean it has acquired understanding.
I already asked you this earlier, but if I say yes, what will your point be?Alright, and will you also "admit" that that first paragraph is not true of at least some other approaches to AI? And therefore that, to some non-zero degree, LLMs are more about statistics than other forms of AI?
Oh, also, I'd like to point something out.Alright, and will you also "admit" that that first paragraph is not true of at least some other approaches to AI? And therefore that, to some non-zero degree, LLMs are more about statistics than other forms of AI?
From my POV, I think this topic of conversation goes back to here:I already asked you this earlier, but if I say yes, what will your point be?
You didn't answer me before, so, sure.
Hypothetically: LLMs are "more about statistics" than certain other forms of AI.
What does that even imply about anything being discussed?
It seems we're now in agreement on that. So, cool. Glad we had the last three pages of back-and-forth.LLMs (the model, not the framework) encode a statistical model of their training data in the form of a large array of weights, in contrast to say expert systems, which encode discrete rules. LLMs are literally probabilistic, in that their outputs are a list of probabilities. They are self-evidently way more statistically-based than most other forms of AI we’ve tried, and plausibly much more statistically-based than biological intelligence. Statisticality is one of the distinguishing features of this whole approach.
I mean, not really.From my POV, I think this topic of conversation goes back to here:
It seems we're now in agreement on that. So, cool. Glad we had the last three pages of back-and-forth.
Another equally valid way of looking at LLMs is that they're functions that approximate intelligence.Postscript: "LLMs are just statistics" is not my argument, and while I understand where it's coming from, I think it moderately undersells their complexity. I would though argue pretty strongly that framing LLMs as lossy compression of the total information (in the information-theoretic sense) in their training data by way of encoding concepts into vectors in very-high-dimensional space is a very useful way to mentally model their operation, both in that it allows for a manipulable model that matches their architecture and scale and many key empirical results, and in that it suggests some interesting avenues of further investigation (e.g. investigating to what degree output is related to/driven by best-fit lines in that space). I can understand how that might not be useful for everyone, but I have found it useful myself so dismissing it entirely seems unwarranted.
You are really, really struggling with the reality that I am not the one making a claim here.Do you claim humans have intelligence/understanding?
Because I doubt that, if ChatGPT's model outputs a probability distribution value of 75% for a particular token, that means ChatGPT outputs that token exactly 75% of the time. I assume their process is more complicated than just randomly selecting from the softmax'ed output.
Your text from post #439:You are really, really struggling with the reality that I am not the one making a claim here.
Now you're going to have to define "useful."Making a word italic doesn't mean you're making an argument.
wildsman has been trying to get you guys to define "understanding" in a practical, useful way for pages of comments and all you can come up with is more and more emphatic assertions that you're right.
Have you done a lot of training of DNNs? These probability distributions they give as output are pretty fluid and often don't relate very well, statistically, to the training data.If ChatGPT's model gives a probability distribution value of 75% for a particular token in a given set of circumstances and that token doesn't show up 75% of the time in those circumstances, then that means that ChatGPT is borked. Because a PDV of 75% literally means that the event should happen 75% of the time given a specific set of circumstances.
Who said that matters, true or otherwise? I'm fascinated watching the strange maneuvering you are doing to pretend these models are not affected by probability and statistical methods.Only indirectly, if at all.
Nothing in an LLM is counting the number of occurrences of words, phrases, word correlations, etc. in training data as it's being used to train an LLM.
Yes, I have. I even wrote a couple of papers on how to integrate their results with other methods to give more realistic geological simulations.Have you done a lot of training of DNNs? These probability distributions they give as output are pretty fluid and often don't relate very well, statistically, to the training data.
Training is a process.