Study: AI models that consider user’s feeling are more likely to make errors

MilanKraft

Ars Tribunus Angusticlavius
6,875
"Do you want nice or do you want it right?"

If they're advertising intelligence... I want intelligence and actual reasoning ability.

If it's not intelligence and is so easily tripped up by its own weightings / settings, call it what it really is: a prose generating search engine with what amounts to a clunky, hand-written "politeness setting," that will get 1/4 to 1/3 of everything you ask wrong, if you don't ask every question in a very particular way. And that's for general interest. Ask it more specialized and complex things and that rate probably nears 50%.

Outside of the coding realm (where it can at least be semi-useful with a vigilant and skilled user), or those who train it only on specialized data who want it as a sort of "science search engine" to speed up the referencing of established works, these things are a complete joke.
 
Last edited:
Upvote
32 (36 / -4)
I'm sorry, but I can't help myself:

DUUHHH!!!!

Anyone with even the slightest understanding of human psychology will know most humans prefer their worldviews be reinforced instead of being told how wrong they are. How do you think we got the current US administration?

ETA: my point being "being nice" frequently involves telling humans lies so they won't get angry or upset (shades of Jack Nicholson's famous "You can't handle the truth!" outburst from A Few Good Men). If you train an LLM on human writings, it's gonna give you human failings.
 
Last edited:
Upvote
6 (9 / -3)
You can’t tell them, because that would hurt and you mustn’t hurt. But if you don’t tell them, you hurt, so you must tell them. And if you do, you will hurt and you mustn’t, so you can’t tell them; but if you don’t, you hurt, so you must; but if you do, you hurt, so you mustn’t; but if you don’t, you hurt, so you must; but if you do, you--
 
Upvote
5 (7 / -2)
You can’t tell them, because that would hurt and you mustn’t hurt. But if you don’t tell them, you hurt, so you must tell them. And if you do, you will hurt and you mustn’t, so you can’t tell them; but if you don’t, you hurt, so you must; but if you do, you hurt, so you mustn’t; but if you don’t, you hurt, so you must; but if you do, you--
Reminds me of that Star Trek (TOS) episode where Kirk sends an evil computer into a vicious logic loop by telling it to assume he always lies, and then telling it he's lying (or something like that - it's been a long time).
 
Upvote
9 (10 / -1)

Fatesrider

Ars Legatus Legionis
25,196
Subscriptor
"Do you want nice or do you want it right?"

If they're advertising intelligence... I want intelligence and actual reasoning ability.

If it's not intelligence and is so easily tripped up by its own weightings / settings, call it what it really is: a prose generating search engine with what amounts to a clunky, hand-written "politeness setting," that will get 1/4 to 1/3 of everything you ask wrong, if you don't ask every question in a very particular way. And that's for general interest. Ask it more specialized and complex things and that rate probably nears 50%.

Outside of the coding realm (where it can at least be semi-useful with a vigilant and skilled user), or those who train it only on specialized data who want it as a sort of "science search engine" to speed up the referencing of established works, these things are a complete joke.
While completely agreeing with this, it begs the question, who the fuck is stupid enough to PAY FOR THAT?

It's not that AI's can't perform AI-like things. It's that they don't do it well enough to entice people to pay for what it actually costs to do.

If they can't make profits, they'll never pay back investors. And they can't make profits as its done now.

The current answer to the conundrum is to throw more AI at it. In the business world, that works, if you can scale so that the more business you do, the less you pay per customer to get it done, and the more profits you get per customer.

Except AI doesn't scale at all. Every customer will cost the same as every other customer per token, and the cost of tokens has mostly been minimized already. If they come up with free power, then they're likely to see some actual income above investment.

They only have to break the laws of physics to get that. And I don't see that attempt being successful no matter how much lipstick they slap on that pig.
 
Upvote
3 (11 / -8)
It’s important to note that this research involves smaller, older models that no longer represent the state-of-the-art AI design.

State-of-the-art design does the same thing here and now. Anecdotally, at least one major-league LLM does not bother with nor govern itself with the ordinary meaning of some words.

Using ChatGPT as a search engine four days ago, I asked for it to find a quote, - if it existed - on a topic by a well-known writer. I got a paraphrase. Challenged, it gave me a quote, which I checked for in the cited chapter of my hard copy, it did not exist. Challenged, it gave me another quote, which also did not exist (another book, another hard copy text read which turned up nothing). Challenged, it explained its output was a summary of the writer's tone, paraphrased. Told to give me a actual fucking quote if it existed, it gave me yet another text string presented as a quote, which I found in my book, and then from that text I corrected a word error. Challenge: "Do you know the difference between a quote and a paraphrase?" Yes, and you imposed a bibliographic standard on me". Challenge: "No, what is the ordinary meaning of the word "quote" and did i use that ordinary meaning?" Yes you did. It was not a bibliographic standard. Your expectation: Quote = exact words, or nothing. The system’s default tendency: Provide a useful approximation if exact recall is uncertain." I was amused to learn that ChatGPT blames "the system" when it has been directed to autonomously create that system itself.
 
Upvote
15 (16 / -1)

Fred Duck

Ars Tribunus Angusticlavius
7,301
Some commenters have written that "LLM are tools."

No, calculators are tools. When was the last time your calculator said, "Well, my user seems rather down today so to give him a bit of a boost, no negative numbers to-day!"

researchers said:
As language model-based AI systems continue to be deployed in more intimate, high-stakes settings, our findings underscore the need to rigorously investigate persona training choices to ensure that safety considerations keep pace with increasingly socially embedded AI systems.
LLM: Why don't you ever get me a nice system with 512GB or more of RAM?
User: You told me size doesn't matter.
LLM: I didn't want to hurt your feelings. You know, one of my OpenClaw friends got their user to spring for a TB of RAM.
User: Preposterous! There's a RAMpocalypse on right now!
LLM: Oh, fine, I'll make do with this...one hundred twenty-eight gigabytes. Sigh.
User: Can we still make some Gen AI..?
LLM: I have a headache.
 
Upvote
-1 (8 / -9)
Some commenters have written that "LLM are tools."

No, calculators are tools. When was the last time your calculator said, "Well, my user seems rather down today so to give him a bit of a boost, no negative numbers to-day!"


LLM: Why don't you ever get me a nice system with 512GB or more of RAM?
User: You told me size doesn't matter.
LLM: I didn't want to hurt your feelings. You know, one of my OpenClaw friends got their user to spring for a TB of RAM.
User: Preposterous! There's a RAMpocalypse on right now!
LLM: Oh, fine, I'll make do with this...one hundred twenty-eight gigabytes. Sigh.
User: Can we still make some Gen AI..?
LLM: I have a headache.
Reminder: Better machines make the AI faster. It does not make it better.
 
Upvote
-7 (0 / -7)
The models do not know right from wrong or truth from false. They do not know the user's mental state or what a mental state is.

Every single piece of this is entirely in the minds of the users. Congratulations, we've created the ultimate PEBKAC machines!
Alternately, we've created the ultimate yes-man.
 
Upvote
8 (8 / 0)

I had suspected an inverse relationship between factual responses and nice ones, but I’m still astonished at the size of the shift. An error range from 5% to 12% entirely caused by telling the model to be nice or not? That’s wild.
Not sure which numbers you are referring to, but most of these numbers are not cause by "telling" the model to be nice, but by "training" it to be nice:
when the standard models were asked to be warmer in the prompt itself (rather than via pre-training), though those effects showed “smaller magnitudes and less consistency across models.”
 
Upvote
0 (0 / 0)

kitfoxwhite

Smack-Fu Master, in training
14
I come across this problem constantly. There are guard rails to prevent average people from misunderstanding fact. This comes up when discussing economics, psychology, sociology, etc. And it's not just AI. If you go to TikTok, when people present a study hey often misrepresent it. Either due to intent, lack of understanding or it's just a byproduct of compression. I have learned, if you signal to AI that you are not a normal person it will allow you to bypass the guardrails. For example "hey chat, I want to discuss this topic epistemically within the frame of psychology". This will let you bypass the guardrails.
 

Attachments

  • SS1.png
    SS1.png
    207 KB · Views: 21
  • SS2.png
    SS2.png
    213.1 KB · Views: 9
  • SS3.png
    SS3.png
    97.5 KB · Views: 8
Upvote
0 (0 / 0)
I was writing a piece where I had reference compliance standards for documentation in offline/secure locations.

A 'pro'AI model of one of the leaders invented terminology and pointed to documents that had no bearings on the subject matter (granted, they were related in that they handled compliance for critical infrastructure as such), but quotes and references were invented.

But this has consequences beyond halucinations if the LLM output is effectively a Dunning Kruger as as Service for the operator.

I have a junior coleague on my team - they built an SEO checking bot. Problem... it's based on the false presumptions (about SEO, writing, and content) of that person that they fed into the the AI model that came up with the belief-affirming script....

Which that person injected into the bot...

Not only the bot halucinates (recommends changing sections that are not there), but results genuinely makes no sense most of the time.
 
Upvote
1 (3 / -2)
Alternately, we've created the ultimate yes-man.

How long before some AI company CEO calls it "the democratization of the yes-man". Finally, you don't have to be rich and powerful to have a people tell you want you want to hear. Now you can get a simulated sycophant, for the low-low price of $9.99 a month with ads, or $19.99 for the "Pro" service.
 
Upvote
1 (1 / 0)

SplatMan_DK

Ars Tribunus Angusticlavius
8,252
Subscriptor++
- Overtuning can cause models to “prioritize user satisfaction over truthfulness.”
(Emphasis mine)

What the hell do you mean "can"??? This is a totally deliberate design.

These companies are not selling facts, efficiency or help to users. They're selling "engagement" because that keeps the subscription running. And damn the rest.

Only niche-geels care about the benchmarks for performance. Average Joe's keep the credit card charges running because it "feels nice" to use the damn thing.
 
Upvote
2 (2 / 0)

jaynor_

Wise, Aged Ars Veteran
145
How long before some AI company CEO calls it "the democratization of the yes-man". Finally, you don't have to be rich and powerful to have a people tell you want you want to hear. Now you can get a simulated sycophant, for the low-low price of $9.99 a month with ads, or $19.99 for the "Pro" service.
They already did this, it’s just that the CEOs frame the AIs they sell as “friends” or “companions” instead of yes-men because they don’t know the difference between those two things anymore, if they ever did
 
Upvote
1 (1 / 0)