The tendency to ask AI bots to explain themselves reveals widespread misconceptions about how they work.
See full article...
See full article...
Does anyone else feel like they're living through a society-wide mass pareidolia event? Except for instead of religious people seeing Jesus in their toast, CEOs/journalists/everyone are seeing a ghost inside the machine.A lifetime of hearing humans explain their actions and thought processes has led us to believe that these kinds of written explanations must have some level of self-knowledge behind them. That's just not true with LLMs that are merely mimicking those kinds of text patterns to guess at their own capabilities and flaws.
Consider what happens when you ask an AI model why it made an error. The model will generate a plausible-sounding explanation because that's what the pattern completion demands—there are plenty of examples of written explanations for mistakes on the Internet, after all. But the AI's explanation is just another generated text, not a genuine analysis of what went wrong. It's inventing a story that sounds reasonable, not accessing any kind of error log or internal state.
Does anyone else feel like they're living through a society-wide mass pareidolia event? Except for instead of religious people seeing Jesus in their toast, CEOs/journalists/everyone are seeing a ghost inside the machine.
I wish.If you treat LLMs as what they are, something the makes things up, then they can have some utility. If you treat them as a knowledge engine you will realize that they gaslight the living f* out of you because all they do is make things up.
clearly yes.Are there still people out there who are not aware that these systems are purposefully designed with the sole purpose of confidently making things up?
The model will generate a plausible-sounding explanation because that's what the pattern completion demands—there are plenty of examples of written explanations for mistakes on the Internet, after all.
This belongs on a fortune cookie note. Yes, it's not a fortune, but it's wisdom for the age we live in now.If you treat LLMs as what they are, something the makes things up, then they can have some utility. If you treat them as a knowledge engine you will realize that they gaslight the living f* out of you because all they do is make things up.
Absolutely yes. I'd say that more generally the mistake outlined in the article is just one of a broader set of mistakes in anthropomorphizing the technology. Even the most intelligent of people are going to be incredibly susceptible to it unless they consciously and continually remind themselves what's actually going on, and aggressive "AI" boosters almost always have very obviously succumbed to it.Does anyone else feel like they're living through a society-wide mass pareidolia event? Except for instead of religious people seeing Jesus in their toast, CEOs/journalists/everyone are seeing a ghost inside the machine.
The whole point of this article is that it likely doesn’t know that “it did it”, it’s just generating a response that plausibly follows the text of the prompt. It’s like the thing where LLMs often will admit to mistakes if you call them out on an incorrect statement… but will also admit to mistakes if you “call them out” on a correct statement.Hey, let's credit the honesty here, it at least acknowledged "Yep, I nuked that database!" it's didn't try to dissemble or blame others, just stood there and said "I did it!" although the inability to rollback is a problem, since it gives the illusion of current information, rather than a more helpful answer of "here is a list of when you can or cannot rollback). That should be possible based on the training presumably on the operation of said system.
I had a similar event with a research fellow in the lab who had checked out the source code for a major project I was in charge of, tried to mod some code and then realized he was way out of his depth, and just deleted the source off his machine. But to complete the nuking thought "Oh I need to check code back in or like a late library book the system will be confused". Ya know who was confused? This guy when I did my next update and my source went poof! It took a while for the guy to even understand that yes, he had just committed emptiness in place of 200,000 lines of code! At least GPT would have stood proudly and admitted it, but unhelpfully "nope, no way to recover from this..." rather than walking his commit back out.
You're still falling for the anthropomorphism. The LLM isn't being "honest" or "admitting" to anything, it's just giving a statistically probable response. If you accuse it of advising you to hire a giraffe as the CEO of your company, it'll apologize for that too.Hey, let's credit the honesty here, it at least acknowledged "Yep, I nuked that database!" it's didn't try to dissemble or blame others, just stood there and said "I did it!" although the inability to rollback is a problem, since it gives the illusion of current information, rather than a more helpful answer of "here is a list of when you can or cannot rollback). That should be possible based on the training presumably on the operation of said system.
Me, in the comment section for the last year and a half:Beyond that, it will likely just make something up based on its text-prediction capabilities. So asking it why it did what it did will yield no useful answers.
[...]
Consider what happens when you ask an AI model why it made an error. The model will generate a plausible-sounding explanation because that's what the pattern completion demands—there are plenty of examples of written explanations for mistakes on the Internet, after all. But the AI's explanation is just another generated text, not a genuine analysis of what went wrong. It's inventing a story that sounds reasonable, not accessing any kind of error log or internal state.
Unlike humans who can introspect and assess their own knowledge, AI models don't have a stable, accessible knowledge base they can query. What they "know" only manifests as continuations of specific prompts.
[...]
This means the same model can give completely different assessments of its own capabilities depending on how you phrase your question. Ask "Can you write Python code?" and you might get an enthusiastic yes. Ask "What are your limitations in Python coding?" and you might get a list of things the model claims it cannot do—even if it regularly does them successfully.
The randomness inherent in AI text generation compounds this problem. Even with identical prompts, an AI model might give slightly different responses about its own capabilities each time you ask.