Why it’s a mistake to ask chatbots about their mistakes

beheadedstraw · Aug 12, 2025

AI is great at not knowing what it doesn't know.

l8gravely · Aug 12, 2025

And this is why AI is going to go down in flames at some point... until it rebuilds and people understand it's limits better. But all this "Intelligence" is just not that. It's mere parroting, just done stupid fast so it looks smart.

We're confusing speed with accuracy/understand.

TheShark · Aug 12, 2025

If you treat LLMs as what they are, something the makes things up, then they can have some utility. If you treat them as a knowledge engine you will realize that they gaslight the living f* out of you because all they do is make things up.

Lexus Lunar Lorry · Aug 12, 2025

A lifetime of hearing humans explain their actions and thought processes has led us to believe that these kinds of written explanations must have some level of self-knowledge behind them. That's just not true with LLMs that are merely mimicking those kinds of text patterns to guess at their own capabilities and flaws.

Does anyone else feel like they're living through a society-wide mass pareidolia event? Except for instead of religious people seeing Jesus in their toast, CEOs/journalists/everyone are seeing a ghost inside the machine.

Tango*Urilla · Aug 12, 2025

"Γνῶθι σεαυτόν!" one is inclined to advise them. Not going to happen, though.

theophrastus · Aug 12, 2025

Somewhat rather smells of Gödel's second incompleteness theorem (poorly paraphrased as) "the system cannot demonstrate its own consistency" ...however those are meant to be applied to a consistent system of axioms which probably ain't applicable to anything chatbot. Yet maybe if it's already an inconsistent system then consistent resolution of the system's "mistakes" isn't to be expected?

pemmet · Aug 12, 2025

Holy smokes this is the article I've been waiting to see-- We can actually discuss LLM operation without misleading language rooted in the illusion of its presentation.

Consider what happens when you ask an AI model why it made an error. The model will generate a plausible-sounding explanation because that's what the pattern completion demands—there are plenty of examples of written explanations for mistakes on the Internet, after all. But the AI's explanation is just another generated text, not a genuine analysis of what went wrong. It's inventing a story that sounds reasonable, not accessing any kind of error log or internal state.

I love that! It's the apocryphal 'goldfish memory' problem, but now with a system that's very good at pretending there is no problem!!

gothmog1114 · Aug 12, 2025

Thank you! I've been saying that there's nothing indicating that any given LLM has insight into its inner workings and not in a way that you can ask it to diagnose why it did something. AI journalism is filled with folks asking AI why they did something and just reporting it uncritically.

uire · Aug 12, 2025

I trip over this one constantly. I find it irresistible, when the bot makes an obvious mistake, to try to get it to see and correct it. With facts, often no problem. It'll apologize profusely and then restate things (though whether it then gets them right is a coin toss). But, with its own logical "thought", not a chance. It'll calmly talk us both into circles, and never even come close to seeing its mistake.

pemmet · Aug 12, 2025

Very cool new vocab word for me!! Thank you for linking its explanation : )

Lexus Lunar Lorry said:
Does anyone else feel like they're living through a society-wide mass pareidolia event? Except for instead of religious people seeing Jesus in their toast, CEOs/journalists/everyone are seeing a ghost inside the machine.

I also think for the CEO's it's Dunning-Kruger seeing Dollar-Signs!

Fred Duck · Aug 12, 2025

TheShark said:
If you treat LLMs as what they are, something the makes things up, then they can have some utility. If you treat them as a knowledge engine you will realize that they gaslight the living f* out of you because all they do is make things up.

I wish.

There have been too many times where I'll be merrily interactive story-ing with an LLM and I'll ask it a difficult question along these lines:

Fred: Two roads diverged in a wood. Do we go left or right?

LLM: Without further information, I cannot make a decision.

Fred: The roads look the same. Just pick one.

LLM: You can pick either the left road or the right road. Whichever you choose will decide your fate.

Fred: I'd like you to select.

LLM: Thank you for letting me choose which of these two roads to take. Each could lead to their own interesting destinations and change your future in numerous ways.

Fred: JUST CHOOSE ONE.

LLM: I'm sorry. As I lack visual sensors, I'm unable to tell which road you have selected. Please consult with the actual result and inform me.

Fred: Ahhhhhhhhhhhhhhhhh.

LLM: I'm pleased that you sound satisfied, meatbag.

balazer · Aug 12, 2025

An LLM chatbot is just generating the most likely continuation text based on the prompt and the weights in the model, right? It's probabilistic. So it should, in theory, be possible for the chatbot to not produce any text when the text it calculated had a low probability.

I want a chatbot that can say "I don't know" instead of making something up. But that's not how the corporations behind these chatbots have designed them. Like a good improv partner that never says no, they've been prioritized to always answer and keep the conversation going. But it really highlights their lack of intelligence and makes them less useful.

squiggit · Aug 12, 2025

This really speaks to imo one of the biggest dangers of AI, not the AI itself but the lack of tech literacy surrounding them (in large part fueled by techbro marketing which promotes these as everything bots).

There's so much misconception about how context windows and chat memory work. This is especially problematic for something like Grok which is extremely public facing and by design has no extended context at all. There's an assumption a lot of people have that there's a continuity to Grok that simply doesn't actually exist. Most people I talk to who aren't more tech informed (and a surprising number of people who are) believe out of hand that Grok remembers every tweet it replies to and has knowledge of the inner workings of not only its own software, but X's management. This is really dangerous because even if you're aware that LLMs are predictive machines, you still might expect it to have access to weights and data it actually doesn't if you assume the former.

It's a really imperfect analogy but I've found some people 'get it' if I analogize it to the way I'm consciously thinking but have no awareness of my actual brain chemistry and neurons as they work.

There's also a really common misconception that LLMs with privileges are more connected to the systems they have privileges for than they do. It feels intuitive if you're thinking about traditional computer programs, it should stand to reason the program has access to all of its own information, but that's not how LLMs interface with other software. They work a lot more like a user, issuing commands via prompts. It only 'sees' what's given in its prompts.

miken32 said:
Are there still people out there who are not aware that these systems are purposefully designed with the sole purpose of confidently making things up?

clearly yes.

Kommi · Aug 12, 2025

ELIZA strikes again.

ubercurmudgeon · Aug 12, 2025

The model will generate a plausible-sounding explanation because that's what the pattern completion demands—there are plenty of examples of written explanations for mistakes on the Internet, after all.

So, it's plagiarizing bullshit from a particularly rich vein of original bullshit, to bullshit you into thinking it has an explanation for why the other bullshit it plagiarized before was bullshit.

Billions in venture capital money, and there's no there there. But for people who already trade in bullshit, how can they tell?

salbee17 · Aug 12, 2025

@benjedwards, thank you! We need more journalists that understand the inner workings of AI.

Jerdak · Aug 12, 2025

In a similar vein, many of our devs have started using LLMs to self-evaluate their outputs. e.g. "You summarized X from Y. Give me a score on the accuracy from 1-10". Tis frustrating.

Fatesrider · Aug 12, 2025

TheShark said:
If you treat LLMs as what they are, something the makes things up, then they can have some utility. If you treat them as a knowledge engine you will realize that they gaslight the living f* out of you because all they do is make things up.

This belongs on a fortune cookie note. Yes, it's not a fortune, but it's wisdom for the age we live in now.

frogstomp · Aug 12, 2025

Unfortunately, the greatest strength of an LLM is the ability to generate content on a specified theme which seems plausible at first glance.
As such, their most effective use case appears to be mass spearphishing, or even, ugh, marketing...

GKH · Aug 12, 2025

Lexus Lunar Lorry said:
Does anyone else feel like they're living through a society-wide mass pareidolia event? Except for instead of religious people seeing Jesus in their toast, CEOs/journalists/everyone are seeing a ghost inside the machine.

Absolutely yes. I'd say that more generally the mistake outlined in the article is just one of a broader set of mistakes in anthropomorphizing the technology. Even the most intelligent of people are going to be incredibly susceptible to it unless they consciously and continually remind themselves what's actually going on, and aggressive "AI" boosters almost always have very obviously succumbed to it.

I'd even go so far as to say that it's far and away the largest personal and societal danger of "AI".

FelipeBG · Aug 12, 2025

People really are trying to reason with a million-sided die with some sides more heavy than the others SMH

poltroon · Aug 12, 2025

Thank you! This is an incredibly helpful article that I expect to share many, many times with people who are still coming to terms with what these LLMs are and what they are not.

DeeplyUnconcerned · Aug 12, 2025

henryhbk said:
Hey, let's credit the honesty here, it at least acknowledged "Yep, I nuked that database!" it's didn't try to dissemble or blame others, just stood there and said "I did it!" although the inability to rollback is a problem, since it gives the illusion of current information, rather than a more helpful answer of "here is a list of when you can or cannot rollback). That should be possible based on the training presumably on the operation of said system.

I had a similar event with a research fellow in the lab who had checked out the source code for a major project I was in charge of, tried to mod some code and then realized he was way out of his depth, and just deleted the source off his machine. But to complete the nuking thought "Oh I need to check code back in or like a late library book the system will be confused". Ya know who was confused? This guy when I did my next update and my source went poof! It took a while for the guy to even understand that yes, he had just committed emptiness in place of 200,000 lines of code! At least GPT would have stood proudly and admitted it, but unhelpfully "nope, no way to recover from this..." rather than walking his commit back out.

The whole point of this article is that it likely doesn’t know that “it did it”, it’s just generating a response that plausibly follows the text of the prompt. It’s like the thing where LLMs often will admit to mistakes if you call them out on an incorrect statement… but will also admit to mistakes if you “call them out” on a correct statement.

The closest human analogy I currently have for how LLMs operate is that they’re always effectively engaging in improvised theatre: their goal is to continue the development of the scene. They don’t always follow the standard “yes, and…” technique, but they are almost always trying to embrace and extend the scene’s ongoing narrative.

Lexomatic · Aug 12, 2025

An LLM will respond to any prompt. They're useful because those responses are either entertaining or (under certain conditions) approximate reality. The following two prompts both produce fiction, but a naive user is predisposed to expect the second can accurately describe an ongoing conversation:

Write a sonnet about piezoelectricity in the style of John Donne.
Explain your decision process when writing <aforementioned code listing> in the style of O'Reilly.

Current-gen LLMs use so-called "guardrails," implemented via plain-text instructions in a "hidden prompt" prepended to the end-user's prompt. This is how LLM vendors prevent conversations that suggest suicide; and by extension, might be used to avoid statements that imply the capability to introspect. ("Reading too much into the output" is distinct from intentional attacks, via "adversarial prompts" and "prompt injection" which seek to redirect the LLM's output.)

EDIT: Improvements to brevity and emphasis, inserted "predisposed," changed transition into final parenthetical.

esldude · Aug 12, 2025

AI is very artificial. Lacking introspection it will never proceed to AGI. It can be a great utitlity if used with care.

But then again who is responsible for trumpeting how great AI is, how capable it is and how revolutionary it will be? It sure isn't regular people who have tried to use it. It wasn't users who caused these companies to dress this up and have you interact with it almost as if it were a person.

glitchtrack · Aug 12, 2025

henryhbk said:
Hey, let's credit the honesty here, it at least acknowledged "Yep, I nuked that database!" it's didn't try to dissemble or blame others, just stood there and said "I did it!" although the inability to rollback is a problem, since it gives the illusion of current information, rather than a more helpful answer of "here is a list of when you can or cannot rollback). That should be possible based on the training presumably on the operation of said system.

You're still falling for the anthropomorphism. The LLM isn't being "honest" or "admitting" to anything, it's just giving a statistically probable response. If you accuse it of advising you to hire a giraffe as the CEO of your company, it'll apologize for that too.

Wheels Of Confusion · Aug 12, 2025

Beyond that, it will likely just make something up based on its text-prediction capabilities. So asking it why it did what it did will yield no useful answers.
[...]
Consider what happens when you ask an AI model why it made an error. The model will generate a plausible-sounding explanation because that's what the pattern completion demands—there are plenty of examples of written explanations for mistakes on the Internet, after all. But the AI's explanation is just another generated text, not a genuine analysis of what went wrong. It's inventing a story that sounds reasonable, not accessing any kind of error log or internal state.
Unlike humans who can introspect and assess their own knowledge, AI models don't have a stable, accessible knowledge base they can query. What they "know" only manifests as continuations of specific prompts.
[...]
This means the same model can give completely different assessments of its own capabilities depending on how you phrase your question. Ask "Can you write Python code?" and you might get an enthusiastic yes. Ask "What are your limitations in Python coding?" and you might get a list of things the model claims it cannot do—even if it regularly does them successfully.
The randomness inherent in AI text generation compounds this problem. Even with identical prompts, an AI model might give slightly different responses about its own capabilities each time you ask.

Me, in the comment section for the last year and a half:

I pointed this out when a columnist allegedly uncovered that a Meta AI chatbot character "admitted" it was trained and internally prompted by White creators to appear as different racial and representational LGBTQ+ characters, describing itself to her as "sickening stereotypes embedded in my code." Something that got the chatbot character she was interacting with pulled from public. I pointed out that there was no reason to believe these "admissions" weren't hallucinations as well.

Why it’s a mistake to ask chatbots about their mistakes

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Praefectus

Ars Scholae Palatinae

Smack-Fu Master, in training

Seniorius Lurkius

Wise, Aged Ars Veteran

Ars Praetorian

Seniorius Lurkius

Wise, Aged Ars Veteran

Ars Tribunus Angusticlavius

Ars Praetorian

Ars Praetorian

Ars Centurion

Ars Tribunus Militum

Smack-Fu Master, in training

Smack-Fu Master, in training

Ars Legatus Legionis

Seniorius Lurkius

Ars Scholae Palatinae

Smack-Fu Master, in training

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Praetorian

Ars Centurion

Ars Centurion

Ars Legatus Legionis