Here’s what’s really going on inside an LLM’s neural network

JoHBE · May 22, 2024

DeeplyUnconcerned said:
Nothing, your assessment is correct I think. I’m just saying that the (rephrased in your terms) question “why does this input cause the output to decode to a statement that we interpret to be false when that almost-identical input decodes to a statement that we interpret to be true?” is, as you say, un-disentangleable to our current tools, whereas “why is it the case that that there exist inputs that result in outputs that decode to statements we interpret to be false?” is very straightforward to answer. The general case of “why does it happen at all?” is well-characterised; the specific case of “why this time and not last time?” is not.

I think that question is about as interesting and insightful as wondering why a particular coin toss turned out heads or tails.. The answer will always be fundamentally about statistics, and not about any exotic mysterious quality.

randomcat · May 22, 2024

Sadre said:
A few bruised thumbs cast no shadow on the hammer. But two bombs spoiled nuclear weapons for everyone now.

Nuclear weapons have been in continuous strategic use and development since they were invented, right up to the present day, though they haven't been used offensively in the field since the first two. They certainly didn't go away, and I don't see how "convincing people not to use them unless you really, really have to" counts as "spoiling them for everyone..."

Eye of Cassandra · May 22, 2024

DeeplyUnconcerned said:
"Why do they lie generally?" is, as you say, a pretty dull question. "Why did it lie this time?" is poorly-understood at best. I think those two questions are often conflated, which I agree is confusing.

This conflation has led people who know nothing about AI to say that nobody knows how AI works. They then try to explain AI to others with misinformation, believing that their guess can't be worse than anyone else's.

Defenestrar · May 22, 2024

wirrbeltier said:
I'm only a neurobiologist with some Python knowledge, but the following quote from Sci-Fi writer Charles Stross feels plausible enough to me: "What we're getting, instead, is self-optimizing tools that defy human comprehension but are not, in fact, any more like our kind of intelligence than a Boeing 737 is like a seagull."

That's a good point. A 737 will always perform within its design space, but a seagull can pull a Johnathan Livingston and learn to be as fast as thought itself.

Pishaw · May 22, 2024

Eye of Cassandra said:
This conflation has led people who know nothing about AI to say that nobody knows how AI works. They then try to explain AI to others with misinformation, believing that their guess can't be worse than anyone else's.

Or, the machine thought it's story was as good or better than the truth. And the truth is fluid, right? Didn't someone say there are 'alternate realities'?

If AI is going to learn to be more 'human', they will learn that from us. And as we are just so fucking shitty, they will learn to be shitty also.

Is there another way to look at this? If so, I don't see it.

nononsense · May 22, 2024

JoHBE said:
I think that question is about as interesting and insightful as wondering why a particular coin toss turned out heads or tails.. The answer will always be fundamentally about statistics, and not about any exotic mysterious quality.

I would be interested to know what you refer to as a 'mysterious' quality. Magic? For me, everything is fundamentally about statistics. Everything is explainable if we have enough information and enough time to understand. You seem to be implying that human consciousness has a layer of mystery or magic that a machine could never possess.

Humans are just intelligent apes. There is no mysterious quality that makes us what we are. No one knows if machines can duplicate or surpass our intelligence or even what that will look like.

"Oh, you think you're conscious? Lol, that's cute." -the first AGI converstation

monogoto · May 22, 2024

JoHBE said:
I'm just a layman, but it keeps puzzling me why the question "why they often confabulate information" is considered relevant or interesting. It's just a matter of complicated statistics, what else could it be? What am I missing here? The network follows exactly the same process each time - whether the output ends up lining up with something that we can determine to be true (via external means) or whether it happens to end up lining up with something that we can determine to be false or nonsensical. It's not like the latter cases are caused by some bug or malfunction. Because at no point is there any process or capability invoked that goes beyond statistical relationships. There's no "truth" module, no "double-check" phase, no "how important is this" assessment, no way to suspend the statistics and employ some other approach that would be more suitable at some point.

No scientific method, at least in the urgent batch.

zogus · May 22, 2024

Pishaw said:
How difficult could it be to deceive us? Half of us believe a giant lizard lives in Loch Ness and the Earth is flat.

Well, you can't fix stupid. As any properly educated person knows, Earth is round and The Man doesn't want you to know that there is a giant lizard in Loch Ness.

SeeUnknown · May 23, 2024

This is a game changer. It appears to be a more reliable method than what is currently being done to tame an LLM! Amazing

SeeUnknown · May 23, 2024

wirrbeltier said:
Most of neurobiology looks like this, too: "Find the relevant location for $Behavior, artificially tune its activity up or down, watch what happens".
I find it strangely amusing that we've arrived at a conceptually similar procedure for artificial neural nets.

See for example this neat study, which pinpointed the neurons that control how pregnant female mice build nests for their pups. After finding the neurons, they made them artificially more excitable (by making them sensitive to light) or less excitable (by knocking in an engineered receptor for a specific chemical), and then saw that nests were more or less elaborate. >5 years of work, building on 120 years of neuroscience, neuroanatomy and behaviour studies.

Graphical abstract:
View attachment 81313

I think this bio and compute goes to pull the covers off on how our minds actually work.

ljw1004 · May 23, 2024

I don't have an inner monologue, and I don't think visually. Instead, as far as I can tell from introspection, I think in concepts and the network of relationships between them. (it's hard for me to translate my thoughts into words, and I don't easily come up with pictures).

The concept-map picture from Anthropic looks AMAZINGLY similar to my subjective impression of how my mind works.

migrena · May 23, 2024

Ezzy Black said:
While we all appreciate the fact the China is most likely well behind on the technology, it was amusing to hear them say they were waiting to release their AI until it's responses were "appropriately socialist."

I guess this is one way they would do that.

i'm pretty sure they are trying to eliminate any trace of winnie the pooh from generated results and failing. good luck doing that this way too.

nivedita · May 23, 2024

JoHBE said:
Sorry, I still don't get it.

It simply ALWAYS does its "thing" correctly, and it is OUR brain/intelligence/ability to evaluate the ouput/ that introduces concepts like "lie", "truth", "accurate". "utter nonsense", "ALMOST there". Without US as an external interpreter of what comes out of it, it is totally helpless and aimless, and those terms have no meaning.

WHAT exactly is there in the programming/finetuning/tweaking/fundamentals that would be expected to somehow go beyond opaque and un-disentangibly complicated statistical relationships? Each and every output is nothing more than a gamble, hoping that some obscure numbers end up in your favor.

Without an external world to test what you think or say against, there isn’t anything in what you think either. If you look through the previous articles on LLM confabulation, it seems that LLMs do have some notion of how likely whatever they’re saying is likely to be true. They can be tuned between saying only things that they are confident about or being more lax. One extreme means they don’t say very much beyond repeating well known facts, you can’t have much of a conversation with such a setting. The other extreme will always make something up even if it has no data about whatever it is you’re talking about.

Also, “complicated statistics” can be very interesting. The real world is pretty much complicated statistics of how tiny particles interact with each other.

ambivalent · May 23, 2024

ljw1004 said:
I don't have an inner monologue, and I don't think visually. Instead, as far as I can tell from introspection, I think in concepts and the network of relationships between them. (it's hard for me to translate my thoughts into words, and I don't easily come up with pictures).

The concept-map picture from Anthropic looks AMAZINGLY similar to my subjective impression of how my mind works.

The lack of inner monologue is interesting, although I can't say I think much differently. I'd certainly agree that I've also run into the problem that trying to communicate concepts based on complex interrelationships between large corpuses of experience that each also have their own previous judgements of worth is.. not easy. However, I do find it easy to visualise certain things and have excellent spatial reasoning/awareness. Could you try an experiment? Consider responding to this post, don't just type it out, just consider it. How are you going to respond? Now, are you in fact speaking our your response in your mind? If so, would you consider that a mental "draft" that you wouldn't normally perform? If not, do you have any idea what your response will be before you start typing it?

rr6013 · May 23, 2024

Zoc said:
This is what I always think after I read a comment from a computer person along the lines of "this is nothing like a human brain, this is just an extremely complicated network of connections reacting to input by searching for patterns in that network." Before I decide whether or not LLMs can ever become human-like, I'll need to hear the opinion of someone who's an expert in computers AND neuroscience.

By specification, big enough LLM will categorically define human, by human type, human features and patterns of word association. Today’s LLM can’t. But there’s a tomorrow only one scale away from defining humanity. Then beyond human comprehension- superLLM.

Deepmind’s alpha fold teaches narrow IQ is super.

rr6013 · May 23, 2024

Random John Smith Guy said:
We had some suspicions something like this might be possible after exploring vector steering, where you could push a model by adding particular vectors at particular layers to, say, change the mood, or always bring up King George III, or whatever you may. I imagine that this method is somewhat similar, if rather more advanced.

However, this article is missing the most bemusing part of this project, where Anthropic taught an AI to conduct proper Maoist self-criticism.

INDEED, author omits much yet admit that the LLM conduct was taught behavior, principles and practices of self-censori

HUGE flyover the heads those not reading critically. Gatekeepers, policy, procedural and agents of all stripe and color have automated conformity measure within grasp and not just breath analyzer neither.

ONE LLM psychologically trained, categorically psychiatrist defined would be able to bubble sort any individual public breadcrumbs back traced online into a complete profile of pseudo-facts, characteristics, nuance and pathologies for any number of rated uses, measures and

Lil' ol' me · May 23, 2024

wirrbeltier said:
I'm only a neurobiologist with some Python knowledge, but the following quote from Sci-Fi writer Charles Stross feels plausible enough to me: "What we're getting, instead, is self-optimizing tools that defy human comprehension but are not, in fact, any more like our kind of intelligence than a Boeing 737 is like a seagull." (check out the entire keynote, it's amazingly prescient for being 6 years old).

AI (or any technology) can only approximate what humans currently understand about a topic (in this case, the mind).

Compare it to food. When all we knew about food was vitamins, then Tang® seemed like a good idea because it provided vitamin C. But we know now that food is more than just vitamins (an actual orange has fiber, anti-oxidants, human digestible sugars, and embodies nutrients & microbes that help our gut flora digest it), and ultra processed foods are bad for us. And we still can't make an orange: there are still lots of things yet to be discovered about it.

Fake orange juice is just an approximation of an orange, the way AI is just a (very crude, but getting better) approximation of thought, sentence composition, or story writing. We can only recreate the parts we currently understand.

Also, consider that things like commercial AI models aren't pure science, but rather commercialized science, so there is always some bias introduced (like Tang®, which was just cheap to produce & market, making it very profitable).

Wheelieking · May 23, 2024

Reminds me of that scene in Interstellar when Cooper asks the robot Tars what his current humer setting is set to, (it was 100%), and then changes it to 70% because he doesn't like the sarcasm.

isartchance · May 23, 2024

Clamping values seems able to weaponise safe agents and vice versa taming of artificial beasts.

Humans might hope to monitor, recognise and limit exploits of emerging conflicts between

clamped attractor constellations and
regulated galaxies of values learned from initial corpus media

augmented by simulated experiences and tuned for instruction following, conversation and conflict resolution alignment specs etc by RLHF etc.

Sadre · May 23, 2024

randomcat said:
Nuclear weapons have been in continuous strategic use and development since they were invented, right up to the present day, though they haven't been used offensively in the field since the first two. They certainly didn't go away, and I don't see how "convincing people not to use them unless you really, really have to" counts as "spoiling them for everyone..."

History of Technology doesn't work by transcribing human deliberations, so you have to kind of get a general framework to explain non-human logical steps in development or non-development, use-or non-use. Which I did.

If you think I didn't mean by "two" exactly the offensive use,I can't help you.

Nuclear weapons have been spoiled for use. They even are spoiled for testing. The technological-human process which I outlined hopefully catches aberrant technology in time. As it has with nuclear explosions over civilian populations.

Don't be dense. Or have at it. This is not the Algonquin roundtable. More like a soup.

fryhole · May 23, 2024

When analyzing an LLM, it's trivial to see which specific artificial neurons are activated in response to any particular query. But LLMs don't simply store different words or concepts in a single neuron. Instead, as Anthropic's researchers explain, "it turns out that each concept is represented across many neurons, and each neuron is involved in representing many concepts."

Once upon a time the whole of the Ars readership was keen on all things tech, until one day after most fo teh teh news sites got gobbled up by giant corporations and expanded their audiences to a trove of non-tech oriented folk, which became likely the majority of the radership. Thus prompting Ars in most instances to water down many of their articles so that a more common reader could better understand WTF the tech was.

This article seems toomit this regarding "artificial neurons". Even the reference links to other articles do not clarify it.

In short they are not neurons and do not function like neurons. They are not hardware and are not dedicated transistors or processors or other hardware focused solely on the LLM crunching.

'Artificial neurons' is a really bad naming convention for software functions (segments) written for processing the data dumped into LLMs. That's it. Fancy phrase for "software". Because somehow "artificial neurons" sounds cooler than LLM Software Functions ??

Here's the basics: AWS Q/A

Here's a detailed breakdown: Artificial neurons

fryhole · May 23, 2024

Sadre said:
Nuclear weapons have been spoiled for use. They even are spoiled for testing. The technological-human process which I outlined hopefully catches aberrant technology in time. As it has with nuclear explosions over civilian populations.

Right up until Putin gets either pissed enough or insane enough to push a button - then your logic will breakdown when he attempts to sterilize Ukraine or toss a few at the US or EU.

But you go ahead and hold onto the notion that Putin (who kills poltical opponents or anyone else that he can that talks bad about how great of a leader he is) or any other radical leader will always stop short of using nukes again because of their humanity.

OrangeCream · May 23, 2024

fryhole said:
Once upon a time the whole of the Ars readership was keen on all things tech, until one day after most fo teh teh news sites got gobbled up by giant corporations and expanded their audiences to a trove of non-tech oriented folk, which became likely the majority of the radership. Thus prompting Ars in most instances to water down many of their articles so that a more common reader could better understand WTF the tech was.

This article seems toomit this regarding "artificial neurons". Even the reference links to other articles do not clarify it.

In short they are not neurons and do not function like neurons. They are not hardware and are not dedicated transistors or processors or other hardware focused solely on the LLM crunching.

'Artificial neurons' is a really bad naming convention for software functions (segments) written for processing the data dumped into LLMs. That's it. Fancy phrase for "software". Because somehow "artificial neurons" sounds cooler than LLM Software Functions ??

Here's the basics: AWS Q/A

Here's a detailed breakdown: Artificial neurons

But the quote you use very specifically says artificial neurons, it doesn't omit it.

Search the article; here's what you get for the first two hits for neuron:
the model's millions of artificial neurons
specific artificial neurons

So yes, while the next five times neuron appears they omit 'artificial', I think it's pretty kosher to try to simplify the text by dropping 'artificial'

And... in all my classwork and reference material we call then neural nets and neurons. It's a given that they're artificial.

Shlazzargh · May 23, 2024

I think I'll toss in this paraphrase of Arthur C Clarke that has been bouncing around in my head lately:

Any sufficiently complex neural network is indistinguishable from consciousness.

nivedita · May 23, 2024

fryhole said:
Once upon a time the whole of the Ars readership was keen on all things tech, until one day after most fo teh teh news sites got gobbled up by giant corporations and expanded their audiences to a trove of non-tech oriented folk, which became likely the majority of the radership. Thus prompting Ars in most instances to water down many of their articles so that a more common reader could better understand WTF the tech was.

This article seems toomit this regarding "artificial neurons". Even the reference links to other articles do not clarify it.

In short they are not neurons and do not function like neurons. They are not hardware and are not dedicated transistors or processors or other hardware focused solely on the LLM crunching.

'Artificial neurons' is a really bad naming convention for software functions (segments) written for processing the data dumped into LLMs. That's it. Fancy phrase for "software". Because somehow "artificial neurons" sounds cooler than LLM Software Functions ??

Here's the basics: AWS Q/A

Here's a detailed breakdown: Artificial neurons

You understood what an LLM is but not what an artificial neuron is?

Pecisk · May 24, 2024

JoHBE said:
I'm just a layman, but it keeps puzzling me why the question "why they often confabulate information" is considered relevant or interesting. It's just a matter of complicated statistics, what else could it be? What am I missing here? The network follows exactly the same process each time - whether the output ends up lining up with something that we can determine to be true (via external means) or whether it happens to end up lining up with something that we can determine to be false or nonsensical. It's not like the latter cases are caused by some bug or malfunction. Because at no point is there any process or capability invoked that goes beyond statistical relationships. There's no "truth" module, no "double-check" phase, no "how important is this" assessment, no way to suspend the statistics and employ some other approach that would be more suitable at some point.

It is like using clever physics hack to make things levitate. Cool hack bro but it is not flying, so why do you care about that?

Pecisk · May 24, 2024

randomcat said:
Nuclear weapons have been in continuous strategic use and development since they were invented, right up to the present day, though they haven't been used offensively in the field since the first two. They certainly didn't go away, and I don't see how "convincing people not to use them unless you really, really have to" counts as "spoiling them for everyone..."

"Strategic" is huge stretch to psyops with very expensive and likely to fail end of the days (sorta) weapons.
It is more like russia trying to keep its facade up. Amount of actually usable nukes is decreasing every year. They are expensive and quite pointless.

Pecisk · May 24, 2024

fryhole said:
Right up until Putin gets either pissed enough or insane enough to push a button - then your logic will breakdown when he attempts to sterilize Ukraine or toss a few at the US or EU.

But you go ahead and hold onto the notion that Putin (who kills poltical opponents or anyone else that he can that talks bad about how great of a leader he is) or any other radical leader will always stop short of using nukes again because of their humanity.

No, they will stop because they are sociopaths and strangely enough that keep us safe so far. putin is not crazy. He played madman theory, it did not pan out. He was clever enough not to keep theoretical escalation up. Which means he kinda wants to die from natural causes not caused by radiation or starvation in bunker with collapsed entrance.
Don't get me wrong, risks for humanity to blow up itself are still there. I just don't believe rulers have intention to do so right now. Risks from society collapse, refugee mega crisis, real shooting war due of resources because global warming is here because LLM was kinda cool and very expensive way to get computer sound like sexy actress....now, THAT is not a risk. That's gonna happen.

Geburtenfresser · May 25, 2024

ambivalent said:
The lack of inner monologue is interesting, although I can't say I think much differently. I'd certainly agree that I've also run into the problem that trying to communicate concepts based on complex interrelationships between large corpuses of experience that each also have their own previous judgements of worth is.. not easy. However, I do find it easy to visualise certain things and have excellent spatial reasoning/awareness. Could you try an experiment? Consider responding to this post, don't just type it out, just consider it. How are you going to respond? Now, are you in fact speaking our your response in your mind? If so, would you consider that a mental "draft" that you wouldn't normally perform? If not, do you have any idea what your response will be before you start typing it?

That was an LLM responding.

Psyborgue · May 25, 2024

isartchance said:
Clamping values seems able to weaponise safe agents and vice versa taming of artificial beasts.

Yup. And Anthropic isn't the first to do model editing like similar to this. For example:

https://github.com/zjunlp/EasyEdit
With the right tweaks that can turn a benign local model into a malicious one. It can also be used to detoxify a language model or mitigate bias.

Given LLaMA was released on 4chan, I can guess what the majority of edits are going to be. Precisely nobody is going to pay somebody capable to mitigate bias for commercial use. And if they do, 4chan will screech like they did for the Gemini image generator and the diverse Nazis, and the company will do a 180. It's very hard to do this and not have side effects.

Kenjitsuka · May 27, 2024

"For example, we might hope to reliably know whether a model is being deceptive or lying to us"

Yeah, hope in one hand and ... well, you know what to do with the other hand.
It's not like lives and potentially the fate of the world are at stake, or anything!

Sadre · May 27, 2024

fryhole said:
Right up until Putin gets either pissed enough or insane enough to push a button - then your logic will breakdown when he attempts to sterilize Ukraine or toss a few at the US or EU.

But you go ahead and hold onto the notion that Putin (who kills poltical opponents or anyone else that he can that talks bad about how great of a leader he is) or any other radical leader will always stop short of using nukes again because of their humanity.

I said hopefully. Do I have to put it in special font for you to recognize a conditional? If hope holds. And it will be considered aberrant. Hopefully.

You can't break conditional logic if you do it properly. But you can misread it. As you prove.

Tall Dwarf · May 28, 2024

KT421 said:
You got it. The most meaningful difference between a hot dog detector neural net that I can make on my personal computer and ChatGPT4 is the absolute bonkers value of n in their n-dimensional space, and the absurd cost of the hardware needed to calculate and store that n-dimensional space.

It seems unlikely. Does hot dog detector have context window?

Detector is trivial, there is architecture behind ChatGPT, significant enough for OpenAI to stop being open.

matthieub · May 29, 2024

With most computer programs—even complex ones—you can meticulously trace through the code and memory usage to figure out why that program generates any specific behavior or output.

Please, don't compare these to programs. You need to compare these to other machine leanring models, as most if not all of the statstical models have these explanations baked in.
Talking about programs makes people believe this is a program, it's not.

k h · Jun 16, 2024

More snake oil.

Here’s what’s really going on inside an LLM’s neural network

Ars Praefectus

Ars Praefectus

Ars Centurion

Senator

Ars Scholae Palatinae

Ars Tribunus Militum

Wise, Aged Ars Veteran

Ars Tribunus Angusticlavius

Ars Praetorian

Ars Praetorian

Wise, Aged Ars Veteran

Ars Praetorian

Ars Tribunus Militum

Smack-Fu Master, in training

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Praetorian

Smack-Fu Master, in training

Ars Scholae Palatinae

Ars Praefectus

Ars Praefectus

Ars Legatus Legionis

Ars Praetorian

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Scholae Palatinae

Account Banned

Ars Scholae Palatinae

Ars Scholae Palatinae

Account Banned

Ars Centurion

Ars Centurion