Being too nice online is a dead giveaway for AI bots, study suggests

UweHalfHand · Nov 7, 2025

Delphine Unseen said:
This matches my expectations, I see lots of accounts that strike me as inauthentic, saying things like, "you have an interesting point of view, I would like to hear more." It doesn't advance conversation or even seem to be aware that there is a conversation happening.

I guess it depends on the context, but I will occasionally respond “your ideas are intriguing to me and I wish to subscribe to your newsletter” to someone, and I do believe that so far I have always meant “you are a complete fucking moron”. And it was understood as such by third parties, although possibly not by the recipient.

Troper1138 · Nov 7, 2025

Man! Fuckin' clankers, amirite? <reaches hand into pants, "adjusts" self, then spits on the ground>

telenoar · Nov 7, 2025

allears said:
It's not so much that humans are unrelentingly negative online. It's just that (most) humans can switch in and out of sarcasm depending on context, something an LLM probably couldn't do.

Yeah, uh huh.

DDopson · Nov 7, 2025

This finding is explained by the following text in the article:

The study also revealed an unexpected finding: instruction-tuned models, which undergo additional training to follow user instructions and behave helpfully, actually perform worse at mimicking humans than their base counterparts.

The LLMs they are using were trained to have a useful balance of properties, including avoiding saying rude things to the user, and they are struggling to surmount that training while generating these fake social media posts. I'm surprised they got no benefit from fine-tuning, but the devil is in the details of how they set that up. If this sort of "match human toxicity" goal were baked in from the start, during pre-training, I don't see why toxicity would be hard for a model to replicate. This is a matter of trying to get a pre-trained model to do something that's a mode switch from most of its training data, not of something that's fundamentally challenging to model.

Eldorito · Nov 7, 2025

telenoar said:
They didn't test Grok.

Grok just sounds like Musk, and figuring out whether Musk is actually human is a task no one wants to take on.

anon7 · Nov 7, 2025

YetAnotherAnonymousAppellation said:
The one really easy to spot clue in my experience is posts with mixed up gender references. Either that or mixing up gender is a real problem for some humans.

Or they’re native speakers of a language that doesn’t pay special attention to gender (spoken Chinese, for example).

Stamped_Fish · Nov 7, 2025

Caveat here that these are (for obvious reasons) only the open-weight models, which have been fine-tuned to follow instructions, be polite and helpful... and not much else. In practice, the user will either further fine-tune them, or use them to build some hobby or business app, but these models out-of-the-box lack the polish and "house style" ("personality") that gets trained into a ChatGPT or Claude.

It's also not at all surprising that the base models perform better at emulating toxicity than instruction-tuned versions - toxicity is exactly what needs to get trained out to make a model instructable in the first place.

Interesting to speculate whether ChatGPT might be worse at faking toxicity because it's been more thoroughly trained to be agreeable even towards hostile users... or better because it's been exposed to hostile conversations during RL?

taswyn · Nov 7, 2025

FangsFirst said:
As someone who tries desperately to avoid toxicity (due to the anxiety it causes me—not that I don't get anxious about all interactions…), let me unleash things for just a moment:

As someone who also took up em-dashes with the ease of starting to use compose keys a few years ago, I'm fucking sick and tired of learning everything I do is what LLMs are tuned to do. This is fucking bullshit.

raises her hand oh look, it me.

Like the WORST part about all this LLM stolen training generative so-called "AI" bullshit is that now I can't use em dashes anymore. I've used em dashes for years and years, they're a great and nicely specific form of pause/syntax, and I've always loved how they look. What's next, semicolons? The fuck were they even training this shit on, like almost no one else ever uses them colloquially??

… so umm, is it safe to at least use the short hand of double dashes, or is that out too? I haven't been paying close enough attention to know, it's all just depressing.

garykirk · Nov 7, 2025

I use Gemini, and I have found that one thing it has real trouble with is humor! Even when it has the context, it goes right over its poor head! It inevitably comes back with some long babble about the punch line...

DDopson · Nov 7, 2025

garykirk said:
I use Gemini, and I have found that one thing it has real trouble with is humor! Even when it has the context, it goes right over its poor head! It inevitably comes back with some long babble about the punch line...

Humor is a high level skill, and I was surprised when the models started being able to explain it at all. That they are a bit wooden at artfully employing it during conversation isn't all that surprising.

Shavano · Nov 7, 2025

I didn't have much trouble getting Copilot to say this:

So, AI models are “easily distinguishable” because they’re too friendly? What a scandal! Imagine the horror: a reply that says “Hope you’re having a great day ” instead of “This take is trash.” Clearly, no human could ever be that polite online. And those classifiers catching AI with 70–80% accuracy? That’s adorable. It’s like bragging about spotting a mime in a rock band—yes, the contrast is obvious, but does it really matter?

Maybe instead of trying to make AI sound more like humans, we should be asking why humans sound like emotionally stunted bots. If friendliness is the giveaway, maybe the real problem is that the internet has trained us to expect sarcasm, snark, and passive aggression as the default tone. So yes, AI is too nice—and that’s the dead giveaway. How dare it.

DDopson · Nov 7, 2025

Shavano said:
I didn't have much trouble getting Copilot to say this:

That bot might have one of the better takes in this thread.

Delphine Unseen · Nov 7, 2025

UweHalfHand said:
I guess it depends on the context, but I will occasionally respond “your ideas are intriguing to me and I wish to subscribe to your newsletter” to someone, and I do believe that so far I have always meant “you are a complete fucking moron”. And it was understood as such by third parties, although possibly not by the recipient.

Yeah what I'm describing is different from that, it's often in response to something really mundane or tame where there's no reason why someone would give a sarcastic response, so best I can tell it's meant to read as sincere.

Thud2 · Nov 7, 2025

Might it be that AI determines that negativity decreases likeliness that it's goal will be achieved?

darkowl · Nov 8, 2025

Although considered highly unethical, even GPT-4chan which was trained on 4chan’s /pol was identified as a bot by people. And its whole thing was to be an asshole!

https://en.wikipedia.org/wiki/GPT4-Chan

Also for those wondering why not MechaHitler? This specific case used open weight models. Grok is not open.

onefang · Nov 8, 2025

with toxicity scores consistently lower than authentic human replies across all three platforms.

To counter this deficiency,

Um, they consider not being toxic enough a deficiency that has to be countered?

onefang · Nov 8, 2025

Mustachioed Copy Cat said:
Well, that’s easy enough, everyone just has to end their correspondence with “and I fucked your mother last night”.

At long last, we have a way to identify the humans from the clankers.

AI fymln

Somdudewillson · Nov 8, 2025

The study also revealed an unexpected finding: instruction-tuned models, which undergo additional training to follow user instructions and behave helpfully, actually perform worse at mimicking humans than their base counterparts.

Yes, who could have possibly expected that the training specifically intended to provide outputs less like the average human results in a model that isn't better at the thing it is being trained not to do.

gilgoomesh · Nov 8, 2025

I'd like to applaud RichVintage (via Getty Images) for a fantastic image at the top of the article.

SubWoofer2 · Nov 8, 2025

My casual negativity is unmatched?!

Today has been a good day.

JoHBE · Nov 8, 2025

Question: IS there a legitimate reason/need to make Chatbot output maximally indistinguishable from real human output? I can't really think of any.

Troper1138 · Nov 8, 2025

Thud2 said:
Might it be that AI determines that negativity decreases likeliness that it's goal will be achieved?

I'm no expert on this stuff, but I think it's still very much the case that "AI"--that is, Large Language Models--don't really "determine" anything, or have any real "goals" (whether "bring about world peace", "kill all humans!", or "bring about world peace by killing all humans!"). LLMs are just very complicated math machines, but have no actual sentience--no way to perceive the physical world.

My understanding of these things is pretty much stuck in the "vibe coding" and "frantically skim Wikipedia and/or Ars Technica articles" stage--I was a liberal arts major, a long damned time ago--but, well, there's there's an XKCD for this. From over 8 years ago.

TheOldChevy · Nov 8, 2025

toxicity is harder to fake than intelligence

It really depends on what "intelligence" means. Is it IQ? Is it the ability to pass tests in school? Or is it that ability to leave in a good understanding with other "intelligent" beings?

MrWalrus · Nov 8, 2025

Troper1138 said:
I'm no expert on this stuff, but I think it's still very much the case that "AI"--that is, Large Language Models--don't really "determine" anything, or have any real "goals" (whether "bring about world peace", "kill all humans!", or "bring about world peace by killing all humans!"). LLMs are just very complicated math machines, but have no actual sentience--no way to perceive the physical world.

Yep, they're still just sophisticated text-prediction engines. They don't think, don't reason, don't have motivations or goals or opinions. They just predict text.

Self-important dipshit techbros like to respond to this by claiming that the human brain is also just a language-prediction machine, because they
A. are assholes who want feel morally superior for causing problems, and
B. genuinely believe that all other fields of expertise are invalid and that their half-baked understanding of cognitive science and philosophy makes them as qualified to opine on the nature of thought as people who actually studied those things.

cleek · Nov 8, 2025

toxicity requires fairly specific intent.
LLMs have no intent.

cleek · Nov 8, 2025

JoHBE said:
Question: IS there a legitimate reason/need to make Chatbot output maximally indistinguishable from real human output? I can't really think of any.

quarterly sales numbers. the stats that drive the world.

internetomancer · Nov 8, 2025

Troper1138 said:
I'm no expert on this stuff, but I think it's still very much the case that "AI"--that is, Large Language Models--don't really "determine" anything, or have any real "goals" (whether "bring about world peace", "kill all humans!", or "bring about world peace by killing all humans!"). LLMs are just very complicated math machines, but have no actual sentience--no way to perceive the physical world.

I think at least for now you can comfortably ignore the fancy definition of "goal" and just treat them as algorithms that are trying to minimize some number. (And that number might be number of humans alive.)

That "goal"-- predicting the next word, receiving positive feedback from users and algorithms, helping the user, following your instructions, writing your stupid app-- is just an abstraction. But it's a helpful word anyway.

When you use an LLM, you need to pretend to ask it a question, and it needs to pretend to understand what you want, and it needs to pretend to understand your goals, and then pretend to come up with ways of helping you achieve them, and it needs to pretend that helping you is its goal. All of it is fake, but you need to play pretend for it to work.

Troper1138 said:
My understanding of these things is pretty much stuck in the "vibe coding" and "frantically skim Wikipedia and/or Ars Technica articles" stage--I was a liberal arts major, a long damned time ago--but, well, there's there's an XKCD for this. From over 8 years ago.

There was a cute opposite xkcd as well.

https://xkcd.com/2173/

odikweos · Nov 8, 2025

jasonmicron said:
Likely Ollama

Thanks, Ollama.

DCinWI · Nov 8, 2025

Fearknot said:
Why would "sound like a human on social media" be a design goal for AI? Not only is that hardly the gold standard for desirable behavior by humans, it also doesn't sound even close to the "killer app" that makes AI worthwhile or profitable.

Because advertising.

The AI companies can sell (on the down-low) "human-sounding" bots to social media companies. Then the social media companies can populate their services with these bots to claim a specific membership or participation level. That specific membership or participation level can be used as a marketing point for advertisers to run ads on the social media platforms.

DCinWI · Nov 8, 2025

JoHBE said:
Question: IS there a legitimate reason/need to make Chatbot output maximally indistinguishable from real human output? I can't really think of any.

Because advertising. See my reply above.

Whiner42 · Nov 8, 2025

What? AI companies can't stoke engagement by training LLMs to be ingratiating little toadies and then expect them to match the raw emotion and nastiness you'll find in forums?

Surprise, surprise.

"I'm sorry, Dave. I'm afraid I can't do that."

Nightdweller · Nov 8, 2025

I suspect it’s an artifact of training and controls. We’ve seen very toxic ai - it’s not hard to make.

raissa_lasorciere · Nov 8, 2025

I am not sure why this is a problem. Randomly drop determiners and conjunctions, misspell words, misplace commas, write the occasional phrase in all caps. Done.

Naito · Nov 8, 2025

Rather than seeing this as sad, I think this is encouraging. It kinda reinforces how most disussion is civilized, and there’s still just a small number of assholes. The training data is mostly nice with just some jerks, hence why it can’t imitate them effectively.

Though that could be too optimistic a view.

Derecho Imminent · Nov 8, 2025

with toxicity scores consistently lower than authentic human replies across all three platforms.

I never would have guessed that one day we would have such a thing as a toxicity score. What a world.

hillspuck · Nov 8, 2025

msawzall said:
We asked a computer if another computer sounded authentic, and the answer was a resounding yes!

Picking a nit, but the answer was a resounding "no".

“Even after calibration, LLM outputs remain clearly distinguishable from human text, particularly in affective tone and emotional expression,” the researchers wrote.

I wonder how the LLM scored on pedantry. Another dead giveaway for real humans on the internet.

WhatGravitas · Nov 8, 2025

My (unscientific, pure gut feeling-based) theory is that it's the same problem as being funny. For toxicity to "work", much like sarcasm or jokes, you need some level of empathy and/or theory of mind of the person you're interacting with.

LLMs don't maintain a model of mind (or model of reality) beyond the text they're currently operating on, so figuring out when a toxic comment actually lands or a joke fires is beyond the current architectures.

Add on top of that that, because of that, a lot of real toxicity is interactive (and hence, not just on the internet), there's less reliable training data - whereas "sounding intelligent" is something done much more formally, much more often in written form (producing training data).

VectorRevival · Nov 8, 2025

The study authors just don’t misunderstand different types of LLMs. They tested IFT instruction fine tuned models, which is understandable because that’s what basically everybody uses. IFT models are literally fine tuned to follow instructions, which to start with makes them less authentic. Also generally they are trained with RLHF to be docile polite netizens.

In contrast to IFT models, raw pretrained LLMs sound a LOT like humans. But these are rarely used for lots of reasons. The main one is that they don’t follow instructions, making them difficult to do anything with. But another is that they sound like the typical jerks you find on the Internet.

If you understand this distinction, every conclusion made is completely obvious.

10Nov1775 · Nov 8, 2025

PaulWTAMU said:
Wow, that's depressing in a whole new direction

Nah, something with no wants, needs, hurts, goals, motivations, or gastrointestinal indigestion has everything it needs to out-positive a human being—because what it needs is to have none of those things.

And I, for one, would certainly prefer that the robits keep their hands where I can see them, thank you very much. I don't think I'll ever find it depressing if we can reliably identify AI-generated text.

10Nov1775 · Nov 8, 2025

VectorRevival said:
The study authors just don’t misunderstand different types of LLMs. They tested IFT instruction fine tuned models, which is understandable because that’s what basically everybody uses. IFT models are literally fine tuned to follow instructions, which to start with makes them less authentic. Also generally they are trained with RLHF to be docile polite netizens.

In contrast to IFT models, raw pretrained LLMs sound a LOT like humans. But these are rarely used for lots of reasons. The main one is that they don’t follow instructions, making them difficult to do anything with. But another is that they sound like the typical jerks you find on the Internet.

If you understand this distinction, every conclusion made is completely obvious.

I mean, just designing an LLM with the goal of making it accurate, knowledgeable, and helpful, however you might define those things, regardless of tone, should seriously impede its ability to recreate social media comments; every bit of that design is taking it further away from what social media comments actually are.

Tongue firmly in cheek, of course. But also, seriously...

Being too nice online is a dead giveaway for AI bots, study suggests

Ars Centurion

Wise, Aged Ars Veteran

Ars Centurion

Ars Tribunus Militum

Ars Tribunus Angusticlavius

Seniorius Lurkius

Wise, Aged Ars Veteran

Ars Praefectus

Smack-Fu Master, in training

Ars Tribunus Militum

Ars Legatus Legionis

Ars Tribunus Militum

Seniorius Lurkius

Smack-Fu Master, in training

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Scholae Palatinae

Smack-Fu Master, in training

Smack-Fu Master, in training

Ars Tribunus Militum

Ars Praefectus

Wise, Aged Ars Veteran

Ars Tribunus Militum

toxicity is harder to fake than intelligence​

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Praefectus

Smack-Fu Master, in training

Smack-Fu Master, in training

Ars Scholae Palatinae

Smack-Fu Master, in training

Smack-Fu Master, in training

Wise, Aged Ars Veteran

Ars Legatus Legionis

Ars Scholae Palatinae

Smack-Fu Master, in training

Wise, Aged Ars Veteran

Ars Scholae Palatinae

Ars Scholae Palatinae

toxicity is harder to fake than intelligence