New "computational Turing test" reportedly catches AI pretending to be human with 80% accuracy.
See full article...
See full article...
I guess it depends on the context, but I will occasionally respond “your ideas are intriguing to me and I wish to subscribe to your newsletter” to someone, and I do believe that so far I have always meant “you are a complete fucking moron”. And it was understood as such by third parties, although possibly not by the recipient.This matches my expectations, I see lots of accounts that strike me as inauthentic, saying things like, "you have an interesting point of view, I would like to hear more." It doesn't advance conversation or even seem to be aware that there is a conversation happening.
Yeah, uh huh.It's not so much that humans are unrelentingly negative online. It's just that (most) humans can switch in and out of sarcasm depending on context, something an LLM probably couldn't do.
The study also revealed an unexpected finding: instruction-tuned models, which undergo additional training to follow user instructions and behave helpfully, actually perform worse at mimicking humans than their base counterparts.
They didn't test Grok.
Or they’re native speakers of a language that doesn’t pay special attention to gender (spoken Chinese, for example).The one really easy to spot clue in my experience is posts with mixed up gender references. Either that or mixing up gender is a real problem for some humans.
As someone who tries desperately to avoid toxicity (due to the anxiety it causes me—not that I don't get anxious about all interactions…), let me unleash things for just a moment:
As someone who also took up em-dashes with the ease of starting to use compose keys a few years ago, I'm fucking sick and tired of learning everything I do is what LLMs are tuned to do. This is fucking bullshit.
Humor is a high level skill, and I was surprised when the models started being able to explain it at all. That they are a bit wooden at artfully employing it during conversation isn't all that surprising.I use Gemini, and I have found that one thing it has real trouble with is humor! Even when it has the context, it goes right over its poor head! It inevitably comes back with some long babble about the punch line...
So, AI models are “easily distinguishable” because they’re too friendly? What a scandal! Imagine the horror: a reply that says “Hope you’re having a great day” instead of “This take is trash.” Clearly, no human could ever be that polite online. And those classifiers catching AI with 70–80% accuracy? That’s adorable. It’s like bragging about spotting a mime in a rock band—yes, the contrast is obvious, but does it really matter?
Maybe instead of trying to make AI sound more like humans, we should be asking why humans sound like emotionally stunted bots. If friendliness is the giveaway, maybe the real problem is that the internet has trained us to expect sarcasm, snark, and passive aggression as the default tone. So yes, AI is too nice—and that’s the dead giveaway. How dare it.
That bot might have one of the better takes in this thread.I didn't have much trouble getting Copilot to say this:
Yeah what I'm describing is different from that, it's often in response to something really mundane or tame where there's no reason why someone would give a sarcastic response, so best I can tell it's meant to read as sincere.I guess it depends on the context, but I will occasionally respond “your ideas are intriguing to me and I wish to subscribe to your newsletter” to someone, and I do believe that so far I have always meant “you are a complete fucking moron”. And it was understood as such by third parties, although possibly not by the recipient.
Um, they consider not being toxic enough a deficiency that has to be countered?with toxicity scores consistently lower than authentic human replies across all three platforms.
To counter this deficiency,
AI fymlnWell, that’s easy enough, everyone just has to end their correspondence with “and I fucked your mother last night”.
At long last, we have a way to identify the humans from the clankers.
Yes, who could have possibly expected that the training specifically intended to provide outputs less like the average human results in a model that isn't better at the thing it is being trained not to do.The study also revealed an unexpected finding: instruction-tuned models, which undergo additional training to follow user instructions and behave helpfully, actually perform worse at mimicking humans than their base counterparts.
I'm no expert on this stuff, but I think it's still very much the case that "AI"--that is, Large Language Models--don't really "determine" anything, or have any real "goals" (whether "bring about world peace", "kill all humans!", or "bring about world peace by killing all humans!"). LLMs are just very complicated math machines, but have no actual sentience--no way to perceive the physical world.Might it be that AI determines that negativity decreases likeliness that it's goal will be achieved?
It really depends on what "intelligence" means. Is it IQ? Is it the ability to pass tests in school? Or is it that ability to leave in a good understanding with other "intelligent" beings?toxicity is harder to fake than intelligence
I'm no expert on this stuff, but I think it's still very much the case that "AI"--that is, Large Language Models--don't really "determine" anything, or have any real "goals" (whether "bring about world peace", "kill all humans!", or "bring about world peace by killing all humans!"). LLMs are just very complicated math machines, but have no actual sentience--no way to perceive the physical world.
quarterly sales numbers. the stats that drive the world.Question: IS there a legitimate reason/need to make Chatbot output maximally indistinguishable from real human output? I can't really think of any.
I think at least for now you can comfortably ignore the fancy definition of "goal" and just treat them as algorithms that are trying to minimize some number. (And that number might be number of humans alive.)I'm no expert on this stuff, but I think it's still very much the case that "AI"--that is, Large Language Models--don't really "determine" anything, or have any real "goals" (whether "bring about world peace", "kill all humans!", or "bring about world peace by killing all humans!"). LLMs are just very complicated math machines, but have no actual sentience--no way to perceive the physical world.
There was a cute opposite xkcd as well.My understanding of these things is pretty much stuck in the "vibe coding" and "frantically skim Wikipedia and/or Ars Technica articles" stage--I was a liberal arts major, a long damned time ago--but, well, there's there's an XKCD for this. From over 8 years ago.
Thanks, Ollama.Likely Ollama
Because advertising.Why would "sound like a human on social media" be a design goal for AI? Not only is that hardly the gold standard for desirable behavior by humans, it also doesn't sound even close to the "killer app" that makes AI worthwhile or profitable.
Because advertising. See my reply above.Question: IS there a legitimate reason/need to make Chatbot output maximally indistinguishable from real human output? I can't really think of any.
I never would have guessed that one day we would have such a thing as a toxicity score. What a world.with toxicity scores consistently lower than authentic human replies across all three platforms.
Picking a nit, but the answer was a resounding "no".We asked a computer if another computer sounded authentic, and the answer was a resounding yes!
“Even after calibration, LLM outputs remain clearly distinguishable from human text, particularly in affective tone and emotional expression,” the researchers wrote.
Nah, something with no wants, needs, hurts, goals, motivations, or gastrointestinal indigestion has everything it needs to out-positive a human being—because what it needs is to have none of those things.Wow, that's depressing in a whole new direction
I mean, just designing an LLM with the goal of making it accurate, knowledgeable, and helpful, however you might define those things, regardless of tone, should seriously impede its ability to recreate social media comments; every bit of that design is taking it further away from what social media comments actually are.The study authors just don’t misunderstand different types of LLMs. They tested IFT instruction fine tuned models, which is understandable because that’s what basically everybody uses. IFT models are literally fine tuned to follow instructions, which to start with makes them less authentic. Also generally they are trained with RLHF to be docile polite netizens.
In contrast to IFT models, raw pretrained LLMs sound a LOT like humans. But these are rarely used for lots of reasons. The main one is that they don’t follow instructions, making them difficult to do anything with. But another is that they sound like the typical jerks you find on the Internet.
If you understand this distinction, every conclusion made is completely obvious.