I think that question is about as interesting and insightful as wondering why a particular coin toss turned out heads or tails.. The answer will always be fundamentally about statistics, and not about any exotic mysterious quality.Nothing, your assessment is correct I think. I’m just saying that the (rephrased in your terms) question “why does this input cause the output to decode to a statement that we interpret to be false when that almost-identical input decodes to a statement that we interpret to be true?” is, as you say, un-disentangleable to our current tools, whereas “why is it the case that that there exist inputs that result in outputs that decode to statements we interpret to be false?” is very straightforward to answer. The general case of “why does it happen at all?” is well-characterised; the specific case of “why this time and not last time?” is not.
A few bruised thumbs cast no shadow on the hammer. But two bombs spoiled nuclear weapons for everyone now.
"Why do they lie generally?" is, as you say, a pretty dull question. "Why did it lie this time?" is poorly-understood at best. I think those two questions are often conflated, which I agree is confusing.
That's a good point. A 737 will always perform within its design space, but a seagull can pull a Johnathan Livingston and learn to be as fast as thought itself.I'm only a neurobiologist with some Python knowledge, but the following quote from Sci-Fi writer Charles Stross feels plausible enough to me: "What we're getting, instead, is self-optimizing tools that defy human comprehension but are not, in fact, any more like our kind of intelligence than a Boeing 737 is like a seagull."
Or, the machine thought it's story was as good or better than the truth. And the truth is fluid, right? Didn't someone say there are 'alternate realities'?This conflation has led people who know nothing about AI to say that nobody knows how AI works. They then try to explain AI to others with misinformation, believing that their guess can't be worse than anyone else's.
I would be interested to know what you refer to as a 'mysterious' quality. Magic? For me, everything is fundamentally about statistics. Everything is explainable if we have enough information and enough time to understand. You seem to be implying that human consciousness has a layer of mystery or magic that a machine could never possess.I think that question is about as interesting and insightful as wondering why a particular coin toss turned out heads or tails.. The answer will always be fundamentally about statistics, and not about any exotic mysterious quality.
No scientific method, at least in the urgent batch.I'm just a layman, but it keeps puzzling me why the question "why they often confabulate information" is considered relevant or interesting. It's just a matter of complicated statistics, what else could it be? What am I missing here? The network follows exactly the same process each time - whether the output ends up lining up with something that we can determine to be true (via external means) or whether it happens to end up lining up with something that we can determine to be false or nonsensical. It's not like the latter cases are caused by some bug or malfunction. Because at no point is there any process or capability invoked that goes beyond statistical relationships. There's no "truth" module, no "double-check" phase, no "how important is this" assessment, no way to suspend the statistics and employ some other approach that would be more suitable at some point.
Well, you can't fix stupid. As any properly educated person knows, Earth is round and The Man doesn't want you to know that there is a giant lizard in Loch Ness.How difficult could it be to deceive us? Half of us believe a giant lizard lives in Loch Ness and the Earth is flat.
I think this bio and compute goes to pull the covers off on how our minds actually work.Most of neurobiology looks like this, too: "Find the relevant location for $Behavior, artificially tune its activity up or down, watch what happens".
I find it strangely amusing that we've arrived at a conceptually similar procedure for artificial neural nets.
See for example this neat study, which pinpointed the neurons that control how pregnant female mice build nests for their pups. After finding the neurons, they made them artificially more excitable (by making them sensitive to light) or less excitable (by knocking in an engineered receptor for a specific chemical), and then saw that nests were more or less elaborate. >5 years of work, building on 120 years of neuroscience, neuroanatomy and behaviour studies.
Graphical abstract:
View attachment 81313
i'm pretty sure they are trying to eliminate any trace of winnie the pooh from generated results and failing. good luck doing that this way too.While we all appreciate the fact the China is most likely well behind on the technology, it was amusing to hear them say they were waiting to release their AI until it's responses were "appropriately socialist."
I guess this is one way they would do that.
Without an external world to test what you think or say against, there isn’t anything in what you think either. If you look through the previous articles on LLM confabulation, it seems that LLMs do have some notion of how likely whatever they’re saying is likely to be true. They can be tuned between saying only things that they are confident about or being more lax. One extreme means they don’t say very much beyond repeating well known facts, you can’t have much of a conversation with such a setting. The other extreme will always make something up even if it has no data about whatever it is you’re talking about.Sorry, I still don't get it.
It simply ALWAYS does its "thing" correctly, and it is OUR brain/intelligence/ability to evaluate the ouput/ that introduces concepts like "lie", "truth", "accurate". "utter nonsense", "ALMOST there". Without US as an external interpreter of what comes out of it, it is totally helpless and aimless, and those terms have no meaning.
WHAT exactly is there in the programming/finetuning/tweaking/fundamentals that would be expected to somehow go beyond opaque and un-disentangibly complicated statistical relationships? Each and every output is nothing more than a gamble, hoping that some obscure numbers end up in your favor.
The lack of inner monologue is interesting, although I can't say I think much differently. I'd certainly agree that I've also run into the problem that trying to communicate concepts based on complex interrelationships between large corpuses of experience that each also have their own previous judgements of worth is.. not easy. However, I do find it easy to visualise certain things and have excellent spatial reasoning/awareness. Could you try an experiment? Consider responding to this post, don't just type it out, just consider it. How are you going to respond? Now, are you in fact speaking our your response in your mind? If so, would you consider that a mental "draft" that you wouldn't normally perform? If not, do you have any idea what your response will be before you start typing it?I don't have an inner monologue, and I don't think visually. Instead, as far as I can tell from introspection, I think in concepts and the network of relationships between them. (it's hard for me to translate my thoughts into words, and I don't easily come up with pictures).
The concept-map picture from Anthropic looks AMAZINGLY similar to my subjective impression of how my mind works.
By specification, big enough LLM will categorically define human, by human type, human features and patterns of word association. Today’s LLM can’t. But there’s a tomorrow only one scale away from defining humanity. Then beyond human comprehension- superLLM.This is what I always think after I read a comment from a computer person along the lines of "this is nothing like a human brain, this is just an extremely complicated network of connections reacting to input by searching for patterns in that network." Before I decide whether or not LLMs can ever become human-like, I'll need to hear the opinion of someone who's an expert in computers AND neuroscience.
INDEED, author omits much yet admit that the LLM conduct was taught behavior, principles and practices of self-censoriWe had some suspicions something like this might be possible after exploring vector steering, where you could push a model by adding particular vectors at particular layers to, say, change the mood, or always bring up King George III, or whatever you may. I imagine that this method is somewhat similar, if rather more advanced.
However, this article is missing the most bemusing part of this project, where Anthropic taught an AI to conduct proper Maoist self-criticism.
AI (or any technology) can only approximate what humans currently understand about a topic (in this case, the mind).I'm only a neurobiologist with some Python knowledge, but the following quote from Sci-Fi writer Charles Stross feels plausible enough to me: "What we're getting, instead, is self-optimizing tools that defy human comprehension but are not, in fact, any more like our kind of intelligence than a Boeing 737 is like a seagull." (check out the entire keynote, it's amazingly prescient for being 6 years old).
Nuclear weapons have been in continuous strategic use and development since they were invented, right up to the present day, though they haven't been used offensively in the field since the first two. They certainly didn't go away, and I don't see how "convincing people not to use them unless you really, really have to" counts as "spoiling them for everyone..."
Once upon a time the whole of the Ars readership was keen on all things tech, until one day after most fo teh teh news sites got gobbled up by giant corporations and expanded their audiences to a trove of non-tech oriented folk, which became likely the majority of the radership. Thus prompting Ars in most instances to water down many of their articles so that a more common reader could better understand WTF the tech was.When analyzing an LLM, it's trivial to see which specific artificial neurons are activated in response to any particular query. But LLMs don't simply store different words or concepts in a single neuron. Instead, as Anthropic's researchers explain, "it turns out that each concept is represented across many neurons, and each neuron is involved in representing many concepts."
Right up until Putin gets either pissed enough or insane enough to push a button - then your logic will breakdown when he attempts to sterilize Ukraine or toss a few at the US or EU.Nuclear weapons have been spoiled for use. They even are spoiled for testing. The technological-human process which I outlined hopefully catches aberrant technology in time. As it has with nuclear explosions over civilian populations.
But the quote you use very specifically says artificial neurons, it doesn't omit it.Once upon a time the whole of the Ars readership was keen on all things tech, until one day after most fo teh teh news sites got gobbled up by giant corporations and expanded their audiences to a trove of non-tech oriented folk, which became likely the majority of the radership. Thus prompting Ars in most instances to water down many of their articles so that a more common reader could better understand WTF the tech was.
This article seems toomit this regarding "artificial neurons". Even the reference links to other articles do not clarify it.
In short they are not neurons and do not function like neurons. They are not hardware and are not dedicated transistors or processors or other hardware focused solely on the LLM crunching.
'Artificial neurons' is a really bad naming convention for software functions (segments) written for processing the data dumped into LLMs. That's it. Fancy phrase for "software". Because somehow "artificial neurons" sounds cooler than LLM Software Functions ??
Here's the basics: AWS Q/A
Here's a detailed breakdown: Artificial neurons
You understood what an LLM is but not what an artificial neuron is?Once upon a time the whole of the Ars readership was keen on all things tech, until one day after most fo teh teh news sites got gobbled up by giant corporations and expanded their audiences to a trove of non-tech oriented folk, which became likely the majority of the radership. Thus prompting Ars in most instances to water down many of their articles so that a more common reader could better understand WTF the tech was.
This article seems toomit this regarding "artificial neurons". Even the reference links to other articles do not clarify it.
In short they are not neurons and do not function like neurons. They are not hardware and are not dedicated transistors or processors or other hardware focused solely on the LLM crunching.
'Artificial neurons' is a really bad naming convention for software functions (segments) written for processing the data dumped into LLMs. That's it. Fancy phrase for "software". Because somehow "artificial neurons" sounds cooler than LLM Software Functions ??
Here's the basics: AWS Q/A
Here's a detailed breakdown: Artificial neurons
It is like using clever physics hack to make things levitate. Cool hack bro but it is not flying, so why do you care about that?I'm just a layman, but it keeps puzzling me why the question "why they often confabulate information" is considered relevant or interesting. It's just a matter of complicated statistics, what else could it be? What am I missing here? The network follows exactly the same process each time - whether the output ends up lining up with something that we can determine to be true (via external means) or whether it happens to end up lining up with something that we can determine to be false or nonsensical. It's not like the latter cases are caused by some bug or malfunction. Because at no point is there any process or capability invoked that goes beyond statistical relationships. There's no "truth" module, no "double-check" phase, no "how important is this" assessment, no way to suspend the statistics and employ some other approach that would be more suitable at some point.
"Strategic" is huge stretch to psyops with very expensive and likely to fail end of the days (sorta) weapons.Nuclear weapons have been in continuous strategic use and development since they were invented, right up to the present day, though they haven't been used offensively in the field since the first two. They certainly didn't go away, and I don't see how "convincing people not to use them unless you really, really have to" counts as "spoiling them for everyone..."
No, they will stop because they are sociopaths and strangely enough that keep us safe so far. putin is not crazy. He played madman theory, it did not pan out. He was clever enough not to keep theoretical escalation up. Which means he kinda wants to die from natural causes not caused by radiation or starvation in bunker with collapsed entrance.Right up until Putin gets either pissed enough or insane enough to push a button - then your logic will breakdown when he attempts to sterilize Ukraine or toss a few at the US or EU.
But you go ahead and hold onto the notion that Putin (who kills poltical opponents or anyone else that he can that talks bad about how great of a leader he is) or any other radical leader will always stop short of using nukes again because of their humanity.
That was an LLM responding.The lack of inner monologue is interesting, although I can't say I think much differently. I'd certainly agree that I've also run into the problem that trying to communicate concepts based on complex interrelationships between large corpuses of experience that each also have their own previous judgements of worth is.. not easy. However, I do find it easy to visualise certain things and have excellent spatial reasoning/awareness. Could you try an experiment? Consider responding to this post, don't just type it out, just consider it. How are you going to respond? Now, are you in fact speaking our your response in your mind? If so, would you consider that a mental "draft" that you wouldn't normally perform? If not, do you have any idea what your response will be before you start typing it?
Yup. And Anthropic isn't the first to do model editing like similar to this. For example:Clamping values seems able to weaponise safe agents and vice versa taming of artificial beasts.
"For example, we might hope to reliably know whether a model is being deceptive or lying to us"
Right up until Putin gets either pissed enough or insane enough to push a button - then your logic will breakdown when he attempts to sterilize Ukraine or toss a few at the US or EU.
But you go ahead and hold onto the notion that Putin (who kills poltical opponents or anyone else that he can that talks bad about how great of a leader he is) or any other radical leader will always stop short of using nukes again because of their humanity.
It seems unlikely. Does hot dog detector have context window?You got it. The most meaningful difference between a hot dog detector neural net that I can make on my personal computer and ChatGPT4 is the absolute bonkers value of n in their n-dimensional space, and the absurd cost of the hardware needed to calculate and store that n-dimensional space.
Please, don't compare these to programs. You need to compare these to other machine leanring models, as most if not all of the statstical models have these explanations baked in.With most computer programs—even complex ones—you can meticulously trace through the code and memory usage to figure out why that program generates any specific behavior or output.