Here’s what’s really going on inside an LLM’s neural network

DeeplyUnconcerned · May 22, 2024

If I were writing the original paper, I'd rephrase this:

"the features are likely to be a faithful part of how the model internally represents the world, and how it uses these representations in its behavior."

as this:

"the features are likely to be a faithful part of how the model internally represents the statistical relationships in its training data, and how it uses these representations in its behavior."

It's a subtle difference, but I think it's a more accurate (and more comprehensible?) framing. The model isn't encoding the world, it's encoding the statistical correlations in its gigantic training data. To the (reasonably high) extent that that training data is reflective of the real world, it's indirectly representing the world, but it still doesn't know what a bridge is, it just knows that (among very many other things) it's a token pattern that frequently occurs in a similar sort of context to the token pattern "viaduct", and has a relationship to "river" that's similar (but not identical) to its relationship to "road".

It really is just, as others have said, an n-dimensional coding of the probability space of its training data. This research is cool and neat and I approve of it!

DeeplyUnconcerned · May 22, 2024

JoHBE said:
I'm just a layman, but it keeps puzzling me why the question "why they often confabulate information" is considered relevant or interesting. It's just a matter of complicated statistics, what else could it be? What am I missing here? The network follows exactly the same process each time - whether the output ends up lining up with something that we can determine to be true (via external means) or whether it happens to end up lining up with something that we can determine to be false or nonsensical. It's not like the latter cases are caused by some bug or malfunction. Because at no point is there any process or capability invoked that goes beyond statistical relationships. There's no "truth" module, no "double-check" phase, no "how important is this" assessment, no way to suspend the statistics and employ some other approach that would be more suitable at some point.

"Why do they lie generally?" is, as you say, a pretty dull question. "Why did it lie this time?" is poorly-understood at best. I think those two questions are often conflated, which I agree is confusing.

DeeplyUnconcerned · May 22, 2024

JoHBE said:
Sorry, I still don't get it.

It simply ALWAYS does its "thing" correctly, and it is OUR brain/intelligence/ability to evaluate the ouput/ that introduces concepts like "lie", "truth", "accurate". "utter nonsense", "ALMOST there". Without US as an external interpreter of what comes out of it, it is totally helpless and aimless, and those terms have no meaning.

WHAT exactly is there in the programming/finetuning/tweaking/fundamentals that would be expected to somehow go beyond opaque and un-disentangibly complicated statistical relationships? Each and every output is nothing more than a gamble, hoping that some obscure numbers end up in your favor.

Nothing, your assessment is correct I think. I’m just saying that the (rephrased in your terms) question “why does this input cause the output to decode to a statement that we interpret to be false when that almost-identical input decodes to a statement that we interpret to be true?” is, as you say, un-disentangleable to our current tools, whereas “why is it the case that that there exist inputs that result in outputs that decode to statements we interpret to be false?” is very straightforward to answer. The general case of “why does it happen at all?” is well-characterised; the specific case of “why this time and not last time?” is not.

Search

Search

Here’s what’s really going on inside an LLM’s neural network

DeeplyUnconcerned

Ars Scholae Palatinae

More options

DeeplyUnconcerned

Ars Scholae Palatinae

More options

DeeplyUnconcerned

Ars Scholae Palatinae

More options