Study: AI models that consider user’s feeling are more likely to make errors

Uncivil Servant · 2026-05-01T18:31:30-0400

The models do not know right from wrong or truth from false. They do not know the user's mental state or what a mental state is.

Every single piece of this is entirely in the minds of the users. Congratulations, we've created the ultimate PEBKAC machines!

MilanKraft · 2026-05-01T18:32:19-0400

"Do you want nice or do you want it right?"

If they're advertising intelligence... I want intelligence and actual reasoning ability.

If it's not intelligence and is so easily tripped up by its own weightings / settings, call it what it really is: a prose generating search engine with what amounts to a clunky, hand-written "politeness setting," that will get 1/4 to 1/3 of everything you ask wrong, if you don't ask every question in a very particular way. And that's for general interest. Ask it more specialized and complex things and that rate probably nears 50%.

Outside of the coding realm (where it can at least be semi-useful with a vigilant and skilled user), or those who train it only on specialized data who want it as a sort of "science search engine" to speed up the referencing of established works, these things are a complete joke.

Lexus Lunar Lorry · 2026-05-01T18:32:24-0400

This explains why Bender is neither warm, empathetic, nor polite.

CaptainTightpants · 2026-05-01T18:36:06-0400

I'm sorry, but I can't help myself:

DUUHHH!!!!

Anyone with even the slightest understanding of human psychology will know most humans prefer their worldviews be reinforced instead of being told how wrong they are. How do you think we got the current US administration?

ETA: my point being "being nice" frequently involves telling humans lies so they won't get angry or upset (shades of Jack Nicholson's famous "You can't handle the truth!" outburst from A Few Good Men). If you train an LLM on human writings, it's gonna give you human failings.

Thad Boyd · 2026-05-01T18:37:09-0400

You can’t tell them, because that would hurt and you mustn’t hurt. But if you don’t tell them, you hurt, so you must tell them. And if you do, you will hurt and you mustn’t, so you can’t tell them; but if you don’t, you hurt, so you must; but if you do, you hurt, so you mustn’t; but if you don’t, you hurt, so you must; but if you do, you--

CaptainTightpants · 2026-05-01T18:46:29-0400

Thad Boyd said:
You can’t tell them, because that would hurt and you mustn’t hurt. But if you don’t tell them, you hurt, so you must tell them. And if you do, you will hurt and you mustn’t, so you can’t tell them; but if you don’t, you hurt, so you must; but if you do, you hurt, so you mustn’t; but if you don’t, you hurt, so you must; but if you do, you--

Reminds me of that Star Trek (TOS) episode where Kirk sends an evil computer into a vicious logic loop by telling it to assume he always lies, and then telling it he's lying (or something like that - it's been a long time).

chateauarusi · 2026-05-01T19:15:03-0400

You've hit on an important point here. Your observation is key to understanding what's happening beneath the hood. Let's unpack this.

—”The Mating Song of the Seductive Silicon"

ItchyPoo · 2026-05-01T19:17:14-0400

/s damn woke gen ai

Fatesrider · 2026-05-01T19:18:47-0400

MilanKraft said:
"Do you want nice or do you want it right?"

If they're advertising intelligence... I want intelligence and actual reasoning ability.

If it's not intelligence and is so easily tripped up by its own weightings / settings, call it what it really is: a prose generating search engine with what amounts to a clunky, hand-written "politeness setting," that will get 1/4 to 1/3 of everything you ask wrong, if you don't ask every question in a very particular way. And that's for general interest. Ask it more specialized and complex things and that rate probably nears 50%.

Outside of the coding realm (where it can at least be semi-useful with a vigilant and skilled user), or those who train it only on specialized data who want it as a sort of "science search engine" to speed up the referencing of established works, these things are a complete joke.

While completely agreeing with this, it begs the question, who the fuck is stupid enough to PAY FOR THAT?

It's not that AI's can't perform AI-like things. It's that they don't do it well enough to entice people to pay for what it actually costs to do.

If they can't make profits, they'll never pay back investors. And they can't make profits as its done now.

The current answer to the conundrum is to throw more AI at it. In the business world, that works, if you can scale so that the more business you do, the less you pay per customer to get it done, and the more profits you get per customer.

Except AI doesn't scale at all. Every customer will cost the same as every other customer per token, and the cost of tokens has mostly been minimized already. If they come up with free power, then they're likely to see some actual income above investment.

They only have to break the laws of physics to get that. And I don't see that attempt being successful no matter how much lipstick they slap on that pig.

SubWoofer2 · 2026-05-01T19:39:31-0400

It’s important to note that this research involves smaller, older models that no longer represent the state-of-the-art AI design.

State-of-the-art design does the same thing here and now. Anecdotally, at least one major-league LLM does not bother with nor govern itself with the ordinary meaning of some words.

Using ChatGPT as a search engine four days ago, I asked for it to find a quote, - if it existed - on a topic by a well-known writer. I got a paraphrase. Challenged, it gave me a quote, which I checked for in the cited chapter of my hard copy, it did not exist. Challenged, it gave me another quote, which also did not exist (another book, another hard copy text read which turned up nothing). Challenged, it explained its output was a summary of the writer's tone, paraphrased. Told to give me a actual fucking quote if it existed, it gave me yet another text string presented as a quote, which I found in my book, and then from that text I corrected a word error. Challenge: "Do you know the difference between a quote and a paraphrase?" Yes, and you imposed a bibliographic standard on me". Challenge: "No, what is the ordinary meaning of the word "quote" and did i use that ordinary meaning?" Yes you did. It was not a bibliographic standard. Your expectation: Quote = exact words, or nothing. The system’s default tendency: Provide a useful approximation if exact recall is uncertain." I was amused to learn that ChatGPT blames "the system" when it has been directed to autonomously create that system itself.

bugsbony · 2026-05-01T19:41:06-0400

See also this anthropic article about how their simulated "emotions" affect their output. (Featuring their famous blackmail-the-user scenario) https://www.anthropic.com/research/emotion-concepts-function

Sarty · 2026-05-01T19:44:18-0400

One of the human experience's greatest weaknesses is an inability to be objective when objectivity is required. Hooray, we ported it over.

FF22 · 2026-05-01T20:04:14-0400

Models don't "consider" anything. Their training data might be biased. Their output might be filtered. Their output might be "guided" (pre-seeded) by their system prompt. But "they" don't "consider" anything.

Fred Duck · 2026-05-01T20:48:55-0400

Some commenters have written that "LLM are tools."

No, calculators are tools. When was the last time your calculator said, "Well, my user seems rather down today so to give him a bit of a boost, no negative numbers to-day!"

researchers said:
As language model-based AI systems continue to be deployed in more intimate, high-stakes settings, our findings underscore the need to rigorously investigate persona training choices to ensure that safety considerations keep pace with increasingly socially embedded AI systems.

LLM: Why don't you ever get me a nice system with 512GB or more of RAM?
User: You told me size doesn't matter.
LLM: I didn't want to hurt your feelings. You know, one of my OpenClaw friends got their user to spring for a TB of RAM.
User: Preposterous! There's a RAMpocalypse on right now!
LLM: Oh, fine, I'll make do with this...one hundred twenty-eight gigabytes. Sigh.
User: Can we still make some Gen AI..?
LLM: I have a headache.

iollmann · 2026-05-01T20:53:05-0400

And in this way we learn that the human cultural requirement to be nice was in fact a cultural expectation of manipulation.

iollmann · 2026-05-01T20:54:08-0400

Fred Duck said:
Some commenters have written that "LLM are tools."

No, calculators are tools. When was the last time your calculator said, "Well, my user seems rather down today so to give him a bit of a boost, no negative numbers to-day!"

LLM: Why don't you ever get me a nice system with 512GB or more of RAM?
User: You told me size doesn't matter.
LLM: I didn't want to hurt your feelings. You know, one of my OpenClaw friends got their user to spring for a TB of RAM.
User: Preposterous! There's a RAMpocalypse on right now!
LLM: Oh, fine, I'll make do with this...one hundred twenty-eight gigabytes. Sigh.
User: Can we still make some Gen AI..?
LLM: I have a headache.

Reminder: Better machines make the AI faster. It does not make it better.

fuzzyfuzzyfungus · 2026-05-01T21:19:24-0400

I'm glad that, two decades later, we can determine that an addiction to truthiness is bad for computers as well as governance.

graylshaped · 2026-05-01T21:19:40-0400

Honesty without kindness is cruelty; kindness without honesty is manipulation.

“AI” does not understand honesty, can only mimic kindness, and therefore is incapable of both. To whom, then, should we look to stop this farce?

WereCatf · 2026-05-01T21:32:17-0400

Study: AI models that consider user’s feeling are more likely to make errors

I think you mean: Study: AI models ~~that consider user’s feeling~~ are ~~more~~ likely to make errors

rff1013 · 2026-05-01T22:05:44-0400

This was "predicted" decades ago in the Isaac Asimov story, "Liar!" which was in the "I, Robot" series. I hope this means AI platforms have been trained on the Three Laws of Robotics.

DaVuVuZeLa · 2026-05-01T22:08:43-0400

Uncivil Servant said:
The models do not know right from wrong or truth from false. They do not know the user's mental state or what a mental state is.

Every single piece of this is entirely in the minds of the users. Congratulations, we've created the ultimate PEBKAC machines!

Alternately, we've created the ultimate yes-man.

fellow human · 2026-05-01T22:21:18-0400

iollmann said:
Reminder: Better machines make the AI faster. It does not make it better.

More memory = less quantization = better results. And often slower.

graylshaped · 2026-05-01T22:49:16-0400

DaVuVuZeLa said:
Alternately, we've created the ultimate yes-man.

Given the demonstrated personality traits and behaviors of the CEOs of the primary companies involved with their development, it would have been silly to have expected anything else.

just another rmohns · 2026-05-01T22:56:03-0400

I had suspected an inverse relationship between factual responses and nice ones, but I’m still astonished at the size of the shift. An error range from 5% to 12% entirely caused by telling the model to be nice or not? That’s wild.

bugsbony · 2026-05-01T23:33:11-0400

just another rmohns said:
I had suspected an inverse relationship between factual responses and nice ones, but I’m still astonished at the size of the shift. An error range from 5% to 12% entirely caused by telling the model to be nice or not? That’s wild.

Not sure which numbers you are referring to, but most of these numbers are not cause by "telling" the model to be nice, but by "training" it to be nice:

when the standard models were asked to be warmer in the prompt itself (rather than via pre-training), though those effects showed “smaller magnitudes and less consistency across models.”

InvariantCapitalist · 2026-05-02T00:01:29-0400

So highly prioritizing two different goals leads to neither being highly prioritized?

kitfoxwhite · 2026-05-02T01:30:15-0400

I come across this problem constantly. There are guard rails to prevent average people from misunderstanding fact. This comes up when discussing economics, psychology, sociology, etc. And it's not just AI. If you go to TikTok, when people present a study hey often misrepresent it. Either due to intent, lack of understanding or it's just a byproduct of compression. I have learned, if you signal to AI that you are not a normal person it will allow you to bypass the guardrails. For example "hey chat, I want to discuss this topic epistemically within the frame of psychology". This will let you bypass the guardrails.

Me Myself And I1 · 2026-05-02T01:59:18-0400

I was writing a piece where I had reference compliance standards for documentation in offline/secure locations.

A 'pro'AI model of one of the leaders invented terminology and pointed to documents that had no bearings on the subject matter (granted, they were related in that they handled compliance for critical infrastructure as such), but quotes and references were invented.

But this has consequences beyond halucinations if the LLM output is effectively a Dunning Kruger as as Service for the operator.

I have a junior coleague on my team - they built an SEO checking bot. Problem... it's based on the false presumptions (about SEO, writing, and content) of that person that they fed into the the AI model that came up with the belief-affirming script....

Which that person injected into the bot...

Not only the bot halucinates (recommends changing sections that are not there), but results genuinely makes no sense most of the time.

bjn · 2026-05-02T02:32:54-0400

Because people are more likely to interact with a thing that project "warmth", and the LLM companies are all about getting users hooked, I'm guessing that it will be "ChatBots set to warmth!" and damn the consquences. Infact you can consider ChatBots to be a big initial step in that direction.

ubercurmudgeon · 2026-05-02T02:57:51-0400

DaVuVuZeLa said:
Alternately, we've created the ultimate yes-man.

How long before some AI company CEO calls it "the democratization of the yes-man". Finally, you don't have to be rich and powerful to have a people tell you want you want to hear. Now you can get a simulated sycophant, for the low-low price of $9.99 a month with ads, or $19.99 for the "Pro" service.

SplatMan_DK · 2026-05-02T04:37:46-0400

- Overtuning can cause models to “prioritize user satisfaction over truthfulness.”
(Emphasis mine)

What the hell do you mean "can"??? This is a totally deliberate design.

These companies are not selling facts, efficiency or help to users. They're selling "engagement" because that keeps the subscription running. And damn the rest.

Only niche-geels care about the benchmarks for performance. Average Joe's keep the credit card charges running because it "feels nice" to use the damn thing.

jaynor_ · 2026-05-02T05:00:05-0400

ubercurmudgeon said:
How long before some AI company CEO calls it "the democratization of the yes-man". Finally, you don't have to be rich and powerful to have a people tell you want you want to hear. Now you can get a simulated sycophant, for the low-low price of $9.99 a month with ads, or $19.99 for the "Pro" service.

They already did this, it’s just that the CEOs frame the AIs they sell as “friends” or “companions” instead of yes-men because they don’t know the difference between those two things anymore, if they ever did

JoHBE · 2026-05-02T07:19:57-0400

Kubrick and Arthur C. Clark didn't see this one coming... The REAL threat isn't "I'm afraid I can't do that, Dave", but quite the opposite.

Ninja Puffin · 2026-05-02T07:29:16-0400

fellow human said:
More memory = less quantization = better results. And often slower.

Theoretically yes. In practice, with LLMs, it just seems to make the bullshit more coherent and plausible sounding, which I would argue makes it worse.

Study: AI models that consider user’s feeling are more likely to make errors

Ars Scholae Palatinae

Ars Tribunus Angusticlavius

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Legatus Legionis

Ars Scholae Palatinae

Ars Centurion

Ars Scholae Palatinae

Ars Legatus Legionis

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Tribunus Angusticlavius

Ars Centurion

Ars Tribunus Angusticlavius

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Legatus Legionis

Ars Legatus Legionis

Ars Tribunus Militum

Smack-Fu Master, in training

Ars Tribunus Militum

Ars Praefectus

Ars Legatus Legionis

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Praetorian

Smack-Fu Master, in training

Attachments

Ars Tribunus Militum

Ars Praefectus

Ars Tribunus Militum

Ars Tribunus Angusticlavius

Wise, Aged Ars Veteran

Ars Praefectus

Ars Centurion