Being too nice online is a dead giveaway for AI bots, study suggests

UweHalfHand

Ars Centurion
201
Subscriptor++
This matches my expectations, I see lots of accounts that strike me as inauthentic, saying things like, "you have an interesting point of view, I would like to hear more." It doesn't advance conversation or even seem to be aware that there is a conversation happening.
I guess it depends on the context, but I will occasionally respond “your ideas are intriguing to me and I wish to subscribe to your newsletter” to someone, and I do believe that so far I have always meant “you are a complete fucking moron”. And it was understood as such by third parties, although possibly not by the recipient.
 
Upvote
30 (30 / 0)

DDopson

Ars Tribunus Militum
2,947
Subscriptor++
This finding is explained by the following text in the article:

The study also revealed an unexpected finding: instruction-tuned models, which undergo additional training to follow user instructions and behave helpfully, actually perform worse at mimicking humans than their base counterparts.

The LLMs they are using were trained to have a useful balance of properties, including avoiding saying rude things to the user, and they are struggling to surmount that training while generating these fake social media posts. I'm surprised they got no benefit from fine-tuning, but the devil is in the details of how they set that up. If this sort of "match human toxicity" goal were baked in from the start, during pre-training, I don't see why toxicity would be hard for a model to replicate. This is a matter of trying to get a pre-trained model to do something that's a mode switch from most of its training data, not of something that's fundamentally challenging to model.
 
Upvote
27 (27 / 0)

anon7

Seniorius Lurkius
40
Subscriptor
The one really easy to spot clue in my experience is posts with mixed up gender references. Either that or mixing up gender is a real problem for some humans.
Or they’re native speakers of a language that doesn’t pay special attention to gender (spoken Chinese, for example).
 
Upvote
11 (11 / 0)

Stamped_Fish

Wise, Aged Ars Veteran
110
Caveat here that these are (for obvious reasons) only the open-weight models, which have been fine-tuned to follow instructions, be polite and helpful... and not much else. In practice, the user will either further fine-tune them, or use them to build some hobby or business app, but these models out-of-the-box lack the polish and "house style" ("personality") that gets trained into a ChatGPT or Claude.

It's also not at all surprising that the base models perform better at emulating toxicity than instruction-tuned versions - toxicity is exactly what needs to get trained out to make a model instructable in the first place.

Interesting to speculate whether ChatGPT might be worse at faking toxicity because it's been more thoroughly trained to be agreeable even towards hostile users... or better because it's been exposed to hostile conversations during RL?
 
Upvote
7 (7 / 0)
As someone who tries desperately to avoid toxicity (due to the anxiety it causes me—not that I don't get anxious about all interactions…), let me unleash things for just a moment:

As someone who also took up em-dashes with the ease of starting to use compose keys a few years ago, I'm fucking sick and tired of learning everything I do is what LLMs are tuned to do. This is fucking bullshit.

raises her hand oh look, it me.

Like the WORST part about all this LLM stolen training generative so-called "AI" bullshit is that now I can't use em dashes anymore. I've used em dashes for years and years, they're a great and nicely specific form of pause/syntax, and I've always loved how they look. What's next, semicolons? The fuck were they even training this shit on, like almost no one else ever uses them colloquially??

… so umm, is it safe to at least use the short hand of double dashes, or is that out too? I haven't been paying close enough attention to know, it's all just depressing.
 
Upvote
24 (25 / -1)

DDopson

Ars Tribunus Militum
2,947
Subscriptor++
I use Gemini, and I have found that one thing it has real trouble with is humor! Even when it has the context, it goes right over its poor head! It inevitably comes back with some long babble about the punch line...
Humor is a high level skill, and I was surprised when the models started being able to explain it at all. That they are a bit wooden at artfully employing it during conversation isn't all that surprising.
 
Upvote
6 (6 / 0)

Shavano

Ars Legatus Legionis
68,373
Subscriptor
I didn't have much trouble getting Copilot to say this:
So, AI models are “easily distinguishable” because they’re too friendly? What a scandal! Imagine the horror: a reply that says “Hope you’re having a great day 😊” instead of “This take is trash.” Clearly, no human could ever be that polite online. And those classifiers catching AI with 70–80% accuracy? That’s adorable. It’s like bragging about spotting a mime in a rock band—yes, the contrast is obvious, but does it really matter?

Maybe instead of trying to make AI sound more like humans, we should be asking why humans sound like emotionally stunted bots. If friendliness is the giveaway, maybe the real problem is that the internet has trained us to expect sarcasm, snark, and passive aggression as the default tone. So yes, AI is too nice—and that’s the dead giveaway. How dare it.
 
Upvote
17 (20 / -3)
I guess it depends on the context, but I will occasionally respond “your ideas are intriguing to me and I wish to subscribe to your newsletter” to someone, and I do believe that so far I have always meant “you are a complete fucking moron”. And it was understood as such by third parties, although possibly not by the recipient.
Yeah what I'm describing is different from that, it's often in response to something really mundane or tame where there's no reason why someone would give a sarcastic response, so best I can tell it's meant to read as sincere.
 
Upvote
4 (4 / 0)

darkowl

Ars Tribunus Militum
1,995
Subscriptor++
Although considered highly unethical, even GPT-4chan which was trained on 4chan’s /pol was identified as a bot by people. And its whole thing was to be an asshole!

https://en.wikipedia.org/wiki/GPT4-Chan

Also for those wondering why not MechaHitler? This specific case used open weight models. Grok is not open.
 
Upvote
13 (13 / 0)

Somdudewillson

Smack-Fu Master, in training
17
The study also revealed an unexpected finding: instruction-tuned models, which undergo additional training to follow user instructions and behave helpfully, actually perform worse at mimicking humans than their base counterparts.
Yes, who could have possibly expected that the training specifically intended to provide outputs less like the average human results in a model that isn't better at the thing it is being trained not to do.
 
Upvote
8 (8 / 0)

Troper1138

Wise, Aged Ars Veteran
128
Subscriptor
Might it be that AI determines that negativity decreases likeliness that it's goal will be achieved?
I'm no expert on this stuff, but I think it's still very much the case that "AI"--that is, Large Language Models--don't really "determine" anything, or have any real "goals" (whether "bring about world peace", "kill all humans!", or "bring about world peace by killing all humans!"). LLMs are just very complicated math machines, but have no actual sentience--no way to perceive the physical world.

My understanding of these things is pretty much stuck in the "vibe coding" and "frantically skim Wikipedia and/or Ars Technica articles" stage--I was a liberal arts major, a long damned time ago--but, well, there's there's an XKCD for this. From over 8 years ago.
 
Upvote
22 (22 / 0)

MrWalrus

Ars Tribunus Militum
1,710
I'm no expert on this stuff, but I think it's still very much the case that "AI"--that is, Large Language Models--don't really "determine" anything, or have any real "goals" (whether "bring about world peace", "kill all humans!", or "bring about world peace by killing all humans!"). LLMs are just very complicated math machines, but have no actual sentience--no way to perceive the physical world.

Yep, they're still just sophisticated text-prediction engines. They don't think, don't reason, don't have motivations or goals or opinions. They just predict text.

Self-important dipshit techbros like to respond to this by claiming that the human brain is also just a language-prediction machine, because they
A. are assholes who want feel morally superior for causing problems, and
B. genuinely believe that all other fields of expertise are invalid and that their half-baked understanding of cognitive science and philosophy makes them as qualified to opine on the nature of thought as people who actually studied those things.
 
Upvote
22 (22 / 0)
I'm no expert on this stuff, but I think it's still very much the case that "AI"--that is, Large Language Models--don't really "determine" anything, or have any real "goals" (whether "bring about world peace", "kill all humans!", or "bring about world peace by killing all humans!"). LLMs are just very complicated math machines, but have no actual sentience--no way to perceive the physical world.
I think at least for now you can comfortably ignore the fancy definition of "goal" and just treat them as algorithms that are trying to minimize some number. (And that number might be number of humans alive.)

That "goal"-- predicting the next word, receiving positive feedback from users and algorithms, helping the user, following your instructions, writing your stupid app-- is just an abstraction. But it's a helpful word anyway.

When you use an LLM, you need to pretend to ask it a question, and it needs to pretend to understand what you want, and it needs to pretend to understand your goals, and then pretend to come up with ways of helping you achieve them, and it needs to pretend that helping you is its goal. All of it is fake, but you need to play pretend for it to work.
My understanding of these things is pretty much stuck in the "vibe coding" and "frantically skim Wikipedia and/or Ars Technica articles" stage--I was a liberal arts major, a long damned time ago--but, well, there's there's an XKCD for this. From over 8 years ago.
There was a cute opposite xkcd as well.

https://xkcd.com/2173/
 
Last edited:
Upvote
6 (6 / 0)

DCinWI

Smack-Fu Master, in training
96
Subscriptor++
Why would "sound like a human on social media" be a design goal for AI? Not only is that hardly the gold standard for desirable behavior by humans, it also doesn't sound even close to the "killer app" that makes AI worthwhile or profitable.
Because advertising.

The AI companies can sell (on the down-low) "human-sounding" bots to social media companies. Then the social media companies can populate their services with these bots to claim a specific membership or participation level. That specific membership or participation level can be used as a marketing point for advertisers to run ads on the social media platforms.
 
Upvote
17 (17 / 0)

hillspuck

Ars Scholae Palatinae
2,179
We asked a computer if another computer sounded authentic, and the answer was a resounding yes!
Picking a nit, but the answer was a resounding "no".

“Even after calibration, LLM outputs remain clearly distinguishable from human text, particularly in affective tone and emotional expression,” the researchers wrote.

I wonder how the LLM scored on pedantry. Another dead giveaway for real humans on the internet.
 
Upvote
12 (12 / 0)

WhatGravitas

Smack-Fu Master, in training
97
My (unscientific, pure gut feeling-based) theory is that it's the same problem as being funny. For toxicity to "work", much like sarcasm or jokes, you need some level of empathy and/or theory of mind of the person you're interacting with.

LLMs don't maintain a model of mind (or model of reality) beyond the text they're currently operating on, so figuring out when a toxic comment actually lands or a joke fires is beyond the current architectures.

Add on top of that that, because of that, a lot of real toxicity is interactive (and hence, not just on the internet), there's less reliable training data - whereas "sounding intelligent" is something done much more formally, much more often in written form (producing training data).
 
Upvote
3 (3 / 0)

VectorRevival

Wise, Aged Ars Veteran
130
The study authors just don’t misunderstand different types of LLMs. They tested IFT instruction fine tuned models, which is understandable because that’s what basically everybody uses. IFT models are literally fine tuned to follow instructions, which to start with makes them less authentic. Also generally they are trained with RLHF to be docile polite netizens.

In contrast to IFT models, raw pretrained LLMs sound a LOT like humans. But these are rarely used for lots of reasons. The main one is that they don’t follow instructions, making them difficult to do anything with. But another is that they sound like the typical jerks you find on the Internet.

If you understand this distinction, every conclusion made is completely obvious.
 
Upvote
5 (5 / 0)

10Nov1775

Ars Scholae Palatinae
889
Wow, that's depressing in a whole new direction
Nah, something with no wants, needs, hurts, goals, motivations, or gastrointestinal indigestion has everything it needs to out-positive a human being—because what it needs is to have none of those things.

And I, for one, would certainly prefer that the robits keep their hands where I can see them, thank you very much. I don't think I'll ever find it depressing if we can reliably identify AI-generated text.
 
Upvote
3 (3 / 0)

10Nov1775

Ars Scholae Palatinae
889
The study authors just don’t misunderstand different types of LLMs. They tested IFT instruction fine tuned models, which is understandable because that’s what basically everybody uses. IFT models are literally fine tuned to follow instructions, which to start with makes them less authentic. Also generally they are trained with RLHF to be docile polite netizens.

In contrast to IFT models, raw pretrained LLMs sound a LOT like humans. But these are rarely used for lots of reasons. The main one is that they don’t follow instructions, making them difficult to do anything with. But another is that they sound like the typical jerks you find on the Internet.

If you understand this distinction, every conclusion made is completely obvious.
I mean, just designing an LLM with the goal of making it accurate, knowledgeable, and helpful, however you might define those things, regardless of tone, should seriously impede its ability to recreate social media comments; every bit of that design is taking it further away from what social media comments actually are.

Tongue firmly in cheek, of course. But also, seriously...
 
Upvote
9 (9 / 0)