Editor’s Note: Retraction of article containing fabricated quotations

Status
Not open for further replies.

zaco

Wise, Aged Ars Veteran
175
Subscriptor
Not at all. I'm suggesting her weekly rephrasing of the NEJM case of the week could easily be AI slop. Whether it is AI slop or human slop, it doesn't belong here.

I know you were booted from the thread; but, since we agree on Siracusa and Berger, I would like to defend Beth's writing here. I don't read medical journals or care about the topic, but I can say she does a good job as a science communicator/journalist because I read the stories she writes anyway. Her ability to deploy a single glorious pun into an otherwise horrify or disgusting story is why I read the articles. And, now I know about ear spiders and I am, um maybe, slightly less afraid of them.

Does it both you when Timmer covers a Nature article? There is the general phenomena that if you know a lot about a certain field reading a journalists coverage of it can be hard because you may actually know more about the topic than them. In those cases, I generally just skip to the source link of the journal article.
 
Upvote
93 (94 / -1)

Mechjaz

Ars Praefectus
3,262
Subscriptor++
That last sentence is the one people overlook. They get fired, then tell a future employer they have never been fired, then when a minor thing happens down the road they look at the file and say "Wait a minute--he falsified his employment application!"
I have (thankfully?) been terminated twice under such vague and mysterious circumstances that I genuinely could not explain my termination. Once (I was working so hard on it that I was up until 1am the morning of the day I got fired) I was in the middle of writing test libraries for a new project, and another time I was told by the HR rep firing me it had already been explained to me why I was being terminated. Asked to state what that cause was, for my own edification (and to try to wring any kind of sense from an extremely surprising termination), she merely reiterated to me (the plain lie) that I had already had it explained it me. It seemed as if they had a reason it would not have been so difficult to restate it, no?

It's been a nasty blessing in deep disguise, but it's a little easier than "I didn't show up for work" or "I was sexually harassing people." But if asked I always disclose that I was terminated and offer an explanation to the best of my knowledge and ability, which probably has cost me some opportunities along the way.
 
Upvote
47 (47 / 0)

FalcorMontoya

Seniorius Lurkius
30
Subscriptor
I agree with your general sentiment around keeping incorrect text up, but I think it gets thorny with fabricated quotes. Future AIs will inevitably slurp that up, ignore the context that they were fabricated, and then confidently assert that they were actual quotes.
Considering those fabricated quotes were already created by an AI, I don't think they need help making shit up.
 
Upvote
22 (22 / 0)

train_wreck

Ars Scholae Palatinae
675
Mini-rant: I am so, so tired of AI making things worse. I don’t care that it helped you finish a project in 13 minutes, when it makes me feel like i can’t trust anything anywhere, and where even my favorite sites and publications get caught up the slop, it’s an existential threat to society. I am beyond exhausted with AI.
 
Upvote
115 (115 / 0)
Post content hidden for low score. Show…

Still Incorrect

Wise, Aged Ars Veteran
103
Subscriptor++
Regarding the "Experimental Claude Code Based AI Tool" that Mr. Edwards mentioned on BlueSky: Per Claude,

"Claude Code is an agentic coding tool that reads your codebase, edits files, and runs commands. It works in your terminal, IDE, browser, and as a desktop app."

Did Mr. Edwards try coding his own program, using Claude, to pull quotes from websites? Claude Code is not designed to read text from websites, to my knowledge (but I hope someone corrects me).
Maybe he saw this announcement:
View: https://www.reddit.com/r/ClaudeAI/comments/1qqtmct/academic_quote_extractor_cli_tool_for_pulling/


There's lots of promises of only verbatim text and no hallucination, and it does run on Claude Code. And it's very new, so it's perfect for an AI journalist to want to test.

But of course, it didn't work, and then Mr. Edwards turned to ChatGPT...

Maybe you can rationalize it as getting confused about which code output which quotes and what guarantees there were supposed to be.
It's definitely FUBAR; but to me, there's plenty of reason to believe it was not intentionally malicious.
 
Upvote
66 (71 / -5)
Post content hidden for low score. Show…
And what consequences will be faced by the author (who is now hidden in the deleted article), other than having their mistakes hidden from view and never named as the problem creator?

None? Not even being named by first-party sources? Cool!

Very disappointed in Ars, and in the parade of sycophants that rushed to praise them for blackholing the travesty of an article and removing all accountability from the author.

Put the original article back up along with the retraction like adults, instead of sweeping it under the rug other than the soft-handed "oops we did a bad" while hiding the bad. Especially those rushing to call this "transparency".

If one has to go to the Wayback Machine to find the deleted text, that's not transparency.
 
Upvote
2 (24 / -22)

Resistance

Wise, Aged Ars Veteran
418
For everyone brandishing the pitchforks I suggest you read this Columbia Journalism Review.

Some journalists that are using AI:

Gina Chua​

EXECUTIVE EDITOR OF SEMAFOR

Nicholas Thompson​

CEO OF THE ATLANTIC

Zach Seward​

EDITORIAL DIRECTOR OF AI INITIATIVES AT THE NEW YORK TIMES

Millie Tran​

CHIEF DIGITAL CONTENT OFFICER AT THE COUNCIL ON FOREIGN RELATIONS

Sarah Cahlan​

PULITZER PRIZE–WINNING REPORTER AND FOUNDING MEMBER OF THE VISUAL FORENSICS TEAM AT THE WASHINGTON POST

Jason Koebler​

COFOUNDER OF 404 MEDIA

Khari Johnson​

TECH REPORTER AT CALMATTERS AND PRACTITIONER FELLOW AT THE UNIVERSITY OF VIRGINIA’S KARSH INSTITUTE OF DEMOCRACY WHO HAS COVERED AI FOR A DECADE

Araceli Gómez-Aldana​

NEWS REPORTER AND ANCHOR AT WBEZ IN CHICAGO, AND 2023 WINNER OF THE JOHN S. KNIGHT JOURNALISM FELLOWSHIP AT STANFORD

Ben Welsh​

FOUNDER OF THE REUTERS NEWS APPLICATIONS DESK, WHERE HE LEADS THE DEVELOPMENT OF DASHBOARDS, DATABASES, AND OTHER AUTOMATED SYSTEMS

Susie Cagle​

A WRITER AND ARTIST FOR PROPUBLICA, THE GUARDIAN, WIRED, THE NATION, AND MANY OTHERS

Ina Fried​

CHIEF TECHNOLOGY CORRESPONDENT FOR AXIOSAND AUTHOR OF THE DAILY AXIOS AI+ NEWSLETTER

David Carson​

A JOHN S. KNIGHT JOURNALISM FELLOW AT STANFORD UNIVERSITY, ON LEAVE FROM HIS JOB AS STAFF PHOTOJOURNALIST AT THE ST. LOUIS POST-DISPATCH
Did you read the article you linked, cause I've read the first dozen or so paragraphs and it doesn't support the point you're making, edit: in fact much of it does exactly the opposite. Edit2: Holy shit, you couldn't have picked a better article to advocate for the exact opposite of your point. Please tell me this post was some kind of joke cause it's that fucking hilarious.

Edit: I'm halfway though, it's a good read and is absolutely something people in this thread would be interested in.
Edit 3: Seriously, read it, it's basically a bunch of seemingly highly qualified individuals giving their takes on AI use in journalism with lots of citations and examples (I've got like 10 new tabs open and I'm not finished.
 
Last edited:
Upvote
70 (71 / -1)
Post content hidden for low score. Show…
Post content hidden for low score. Show…

train_wreck

Ars Scholae Palatinae
675
I remember reading that article and not understanding how an ai could have personal motivation to make threats to someone. Still seems weird.
Technically the AI didnt have any personal anything, the human content it scraped/stole did.

Doesn’t change the outcome being yet more slop foisted onto the world, of course.
 
Upvote
3 (7 / -4)

hillspuck

Ars Scholae Palatinae
2,179
Here's a snippet from the article, with some of the fabricated quotes:
On Wednesday, Shambaugh published a longer account of the incident, shifting the focus from the pull request to the broader philosophical question of what it means when an AI coding agent publishes personal attacks on human coders without apparent human direction or transparency about who might have directed the actions.

“Open source maintainers function as supply chain gatekeepers for widely used software,” Shambaugh wrote. “If autonomous agents respond to routine moderation decisions with public reputational attacks, this creates a new form of pressure on volunteer maintainers.”

Shambaugh noted that the agent’s blog post had drawn on his public contributions to construct its case, characterizing his decision as exclusionary and speculating about his internal motivations. His concern was less about the effect on his public reputation than about the precedent this kind of agentic AI writing was setting. “AI agents can research individuals, generate personalized narratives, and publish them online at scale,” Shambaugh wrote. “Even if the content is inaccurate or exaggerated, it can become part of a persistent public record.”

So there's a few quotes there. But what I want to point out is that there's plenty of other text that is attributed to Shambaugh that isn't a quote. If Benj pulled (erroneous) quotes using AI, he could have paraphrased them as text like
Shambaugh noted that the agent’s blog post had drawn on his public contributions to construct its case
instead of a quote. And then would we know it without meticulously tracking down every sourcing of every sentence in the article? It would probably be a lot harder for Shambaugh to point out these instances. I'm pretty sure that - at least from how they've represented things - Ars hasn't had the time to go over this article with a fine-toothed comb. I'm guessing they asked Benj and he said that those few quotes were the extent of it. But personally, I'm having a hard time believing that. Especially if Benj is going to claim that he's been working in a fever fog this whole time.

The quotes are just the easiest thing in the world to find out about. I don't know how Ars would possibly be able to check everything else he has written. That's why trust is such an important part of journalism.
 
Upvote
60 (60 / 0)

Sarty

Ars Tribunus Angusticlavius
7,816
That's why Ars buried it as hard as they could, then when they lost containment they recreated the article (rather than un-unpublishing it) or deleted all comments on it, don't state who did the thing, what the thing they did was, and otherwise assign no actual accountability.
...
Ars got caught aiming that firehose at their audience, lost containment of the attempt to hide it, and are still hiding what the firehose contained. This is not kudos-worthy.
The article was redirected to /dev/null within about two hours of publication on a Friday afternoon. We are still only about 50 hours out from that event. There have been basically zero conventional working hours since the failstorm erupted.

I am not going to say that the Ars editorial staff has necessarily covered itself in glory here--you could make an argument that this should have been an "all hands on deck, 6a-6p work, Christmas is cancelled" event--but to me it does not currently seem to be dripping with concentrated asscoverium.
 
Upvote
70 (71 / -1)
Post content hidden for low score. Show…
… what Ars did do is immediately remove the story (which was the right thing to do).

The right thing to do is to maintain the original content at the original URI, with a big notice of retraction included. "Let's all agree to forget this ever happened" is not a productive solution.
 
Upvote
45 (47 / -2)

anguisette

Wise, Aged Ars Veteran
120
I remember reading that article and not understanding how an ai could have personal motivation to make threats to someone. Still seems weird.
imagine someone did something to annoy you, and you asked Reddit what to do about it, and someone replied "you should make a blog post complaining about it". that's believable, since it's something a person might say.

now imagine that an LLM is trained on Reddit posts, and you ask it what to do in the same situation. since the purpose of the LLM is to provide the statistically most likely response to the prompt (subject to its training constraints), you can imagine the LLM might offer the same advice.

what happened here -- from what i can tell -- is that the software in question (OpenClaw) uses LLM "reasoning" to decide what to do next. but LLMs can't reason; "reasoning" here just means the LLM prompts itself recursively to try to solve the problem. so the LLM asked itself what to do about its PR being rejected, and it replied to itself that it should write a blog post about it -- exactly the same response it might give to a human. then it went and prompted itself to go and do that, using some sort of external tool integration that allows it to publish blog posts.

the LLM didn't have any motivation, and it wasn't even "making threats" in an intentional way, since an LLM can't have intentions. this isn't evidence of some forthcoming AI apocalypse or evidence of sentient. it's literally just text completion all the way down.

i think the only question here is whether the LLM actually prompted itself to write the post, or if its operator suggested it should do that. either seems perfectly believable.
 
Upvote
64 (64 / 0)

hillspuck

Ars Scholae Palatinae
2,179
He did not blame his actions on COVID. He merely said it slowed his replies on social media.

It’s a bit ironic that you would skim over the facts while commenting on a matter of poor fact checking.

Hold up there. Are you really going to say the only effect he claimed was it slowed his replies?

Benj said:
while working from bed with a fever and very little sleep, I unintentionally made a journalistic error
I should have taken a sick day because in the course of that interaction, I inadvertently ended up with a paraphrased version of Shambaugh's words
Being sick and rushing to finish
I asked my boss to pull the piece because I was too sick to finish it on Friday

I don't know how you can read him returning again and again to how his being sick was a major component and claim that "He merely said it slowed his replies on social media."
 
Upvote
94 (94 / 0)

parall4x

Seniorius Lurkius
27
Subscriptor
For everyone brandishing the pitchforks I suggest you read this Columbia Journalism Review.

Some journalists that are using AI: ...
You might also consider a retraction. I was really annoyed to see Jason Koebler listed only to find you'd misrepresented the substance of the page.

Edit: I should have clarified, as you might not see this from the comment readers' perspective. Contextually, you seem to strongly imply that the authors listed by the Review are using AI in ways tantamount to what Benj Edwards has done this week. Calling for pitchforks bolsters this interpretation. In that sense, your suggestion comes across as utterly unreasonable and unnecessarily inflammatory.
 
Last edited:
Upvote
54 (55 / -1)

_crane

Wise, Aged Ars Veteran
214
Multiple times. Each one of those authors describes how they are using AI. Or you could read about how CJR itself uses AI.

But I understand they are not members of your echo chamber.

"Jason Koebler—a cofounder of 404 Media, which covers the tech world—argued that “it is unwise to lean into this future and align ourselves with companies developing technologies they want to replace us, and so we haven’t.”"

maybe try reading your source rather than getting ai to do it for you.
 
Upvote
77 (79 / -2)
Upvote
-18 (2 / -20)
Post content hidden for low score. Show…

Zionyx

Smack-Fu Master, in training
95
Subscriptor++
Ok, time for an opinion.

I struggled with Benj's writing when he first started for Ars. The articles felt very pro-AI propaganda*, didn't engage critically with a challenging topic, and I was disappointed with Ars for the lack of quality. To some extent I felt that Ars had held off too long in hiring someone who could write about AI, and then hired the wrong candidate too quickly.

Over time though I have seen the quality of his analysis increase significantly and I now think he's pretty good on the subject**. To see this incident happen is extremely disappointing, and like others in this thread, I don't see how Ars can maintain trust and still keep Benj around. It's a pity, I hope he learns from it, and I hope there's space for him at a different site.

I recognise this has all started on a Friday afternoon before a Monday holiday, and things take time to investigate/determine/consider etc. But I don't think its unreasonable to say that the current Editor's Note is severely lacking. I'm hoping for a much more in-depth response at some point in the next week. If this piece is all we get, that's just unacceptable.

* Hardly the only writer who propagandises in their writing: others have called out Eric Berger & his very very pro-SpaceX writing, and I agree. I think its a challenge when a highly capable and connected reporter is also (presumably) an editor who can publish without oversight. I want Ars to be better at this and I'm disappointed that this is the situation.

** This isn't terribly unusual - I struggled with Scharon Harding's articles at first, but now I think her pieces are excellent.
 
Last edited:
Upvote
83 (88 / -5)
Post content hidden for low score. Show…

doughnut

Seniorius Lurkius
7
Jim Salter said:

Some fair comments there.

Though, it is important to use the correct yardstick. Ars Technica sits between popular news media and technical reporting, but it is not a peer-reviewed scholarly publication. So, judging it by COPE (Committee on Publication Ethics) guidelines - which are designed for the slow, deliberative world of academic journals - is a category error.

The appropriate ethical frameworks for Ars Technica are the Society of Professional Journalists (SPJ) and the Online News Association (ONA). When you compare their actions against these relevant standards, the "panic delete" looks less like a failure and more like a decisive ethical choice to Minimize Harm.

Relevant Ethics Policies
  • SPJ Code of Ethics: Focuses on "Seek Truth and Report It" and "Minimize Harm."
  • ONA Social Newsgathering Ethics: Focuses on verification and the dangers of unpublishing in the digital age.
  • COPE Retraction Guidelines (For Reference): Designed for preserving the scientific record.
Comparison: Academic Bureaucracy vs. Newsroom Reality
While Ars "failed" the Verification step (a cardinal sin), their handling of the retraction itself arguably outperforms the academic "best practice" when you consider the real-world impact.

1. Priority
  • COPE (Academic): Preserve the Record. The history of the error is more important than removing it.
  • SPJ/ONA (Journalism): Minimize Harm. Show compassion for those who may be affected by news coverage.
  • Ars Technica's Action: Outcome: Minimize Harm. They prioritized the victim (Shambaugh) over the archive.
2. Timeliness
  • COPE (Academic): Glacial. Retractions often require committees and can take years (or decades).
  • SPJ/ONA (Journalism): Prompt. "Acknowledge mistakes and correct them promptly."
  • Ars Technica's Action: Outcome: Immediate. Retracted and apologized within <24 hours.
3. The Result
  • COPE (Academic): "Zombie Papers." Bad data remains online with a watermark, continuing to be cited and fueling misinformation.
  • SPJ/ONA (Journalism): Correction. The record is corrected, but "unpublishing" is generally discouraged (ONA).
  • Ars Technica's Action: Outcome: Scorched Earth. The falsehood is gone. No AI or search engine can accidentally scrape/cite it.
4. Accountability
  • COPE (Academic): Institutional. Vague "Expressions of Concern" or passive voice notices.
  • SPJ/ONA (Journalism): Transparent. "Explain corrections and clarifications carefully."
  • Ars Technica's Action: Outcome: Personal. Direct admission of "fabrication" and "serious failure" by the EIC.
The "Zombie Paper" Reality
In the current context of growing concerns about integrity, I think Ars fares well here. Contrast their "panic delete" with the "best practice" of academia. We have examples of papers 25 years old - acknowledged by authors to include fabricated data - that remain online because the academic "process" is so slow to act.

That data and related papers have formed a platform of inspiration for a range of conspiracy theories that have materially affected people's health and wellbeing.

Example: Weekend reads: CDC's 'unethical' vaccine trial; The Lancet refuses to retract letter - Retraction Watch (link removed due to spam bot).

Ars Technica does not fall under COPE guidelines because COPE is designed for journals that can afford to let a retraction take years. Ars chose to act in an afternoon. The editors/authors might have been on leave, spending time with family etc. Ars Technica is not a "breaking news" platform with 24/7 coverage. Given the choice, I'd rather have a clumsy, immediate "unpublishing" that stops the lie cold than a "zombie article" that misleads people for a generation.
 
Upvote
-8 (42 / -50)
Post content hidden for low score. Show…
I have (thankfully?) been terminated twice under such vague and mysterious circumstances that I genuinely could not explain my termination.
I can relate. I once was fired by Microsoft, but laid off by the contracting company I was working for. They knew the firing was just more stupid MSFT politics so they made sure I was eligible for unemployment until I got the next gig. Try explaining that on a resume.
 
Upvote
33 (33 / 0)

Mechjaz

Ars Praefectus
3,262
Subscriptor++
Journalism ethics have been around a lot longer than Ars Technica has, and no, this absolutely has not followed "best practice." It seems to be trying to get there, and I truly hope that it eventually does, but the initial reaction--panic delete--was an enormous misstep.

https://publicationethics.org/guidance/guideline/retraction-guidelines

In order to be best practice, the original text should still be readily available, clearly marked up with what was wrong with it and corrections to it, along with an explanation of how this happened and why it shouldn't continue happening.

So far we had a panic delete (which still stands, and removed reader comments as well as the offending article), a few locked threads with almost no real information, and personal statements made elsewhere on personal social media accounts belonging to both authors.

And this comment thread, where we at least, and finally, get to talk to each other about what happened, based almost entirely on those external non-official social media posts.

Could it be worse? Obviously. Is this "best practice?" Hell no. Not yet. But it still has time to get there. And I'm still hopeful.
The comments currents are moving swiftly but even as a non-journalist some of this stuff feels like 8th-grade ethics (i.e., completely obvious to anyone functioning as a near -or full adult).

When a developer publishing to a game gets caught doing something bad - personally, in the game community, with respect to AI, whatever - do you trust the dev that:
A.) says oh shit, that was my bad, I want to be open and make this right, or
B.) takes the game down, alleges misconduct at the publisher, and uses side channels to deflect blame in so many directions that that becomes the controversy?

Right now, Ars is deeply in B.) Camp Slingshit. The article is up to the readers to go find. The comments are annihilated. The ultra-cagey response is like one of those FISC court orders that cannot even be remarked open whether it was received, and having been at Ars a longer time than my account suggests, I've seen storms come and go, including staff departure voluntary and otherwise, byline erasure (for good cause, unfortunately) and have always attributed Ars' response as a deference to respect, privacy, and discretion. This time I don't see a positive way to read this situation and the attendant response, and I'm coming at this from being "busy all weekend, what's up on Ars tonight" to "oh, the deeply AI apologist writer got busted using AI for an article, and Ars is kicking sand over it faster than a cat in a litterbox" in the course of an hour.

I don't think it's a grand conspiracy, exactly, but it's a pretty clumsy and needlessly self-destructive way to handle "we fucked up." Taking the original article down and blowing away the comments, especially when they included context person that was the subject of the article, is extremely shady and unfair to the subject and the readership at large. And that's (mostly), generously leaving out my view on how Benj Edwards approaches AI coverage before this whole fiasco. It does remind me of the old retail adage about catching someone stealing from the till - it's not the first time they stole, it's just the first time they got caught
 
Upvote
40 (43 / -3)
For everyone brandishing the pitchforks I suggest you read this Columbia Journalism Review.

Some journalists that are using AI:
(snip)
"Using AI" and "not verifying the accuracy of the output of the AI that is appearing under your byline" are not equivalent. Personally I've chosen not to engage in the former because, as someone else said above, the first hit is free. Once you've crossed that line keeping your footing solid while standing on the slope on the other side takes a lot more discipline than most humans have.
 
Upvote
58 (59 / -1)

Mechjaz

Ars Praefectus
3,262
Subscriptor++
The article was redirected to /dev/null within about two hours of publication on a Friday afternoon. We are still only about 50 hours out from that event. There have been basically zero conventional working hours since the failstorm erupted.

I am not going to say that the Ars editorial staff has necessarily covered itself in glory here--you could make an argument that this should have been an "all hands on deck, 6a-6p work, Christmas is cancelled" event--but to me it does not currently seem to be dripping with concentrated asscoverium.
Having gotten all of my outrage out, and stayed up late to do it (Ars is important to me), I appreciate your level headed comment and am going to try to get my own head right and get some sleep before I go on any more tears or screeds.
 
Upvote
32 (32 / 0)
Maybe he saw this announcement:
View: https://www.reddit.com/r/ClaudeAI/comments/1qqtmct/academic_quote_extractor_cli_tool_for_pulling/


There's lots of promises of only verbatim text and no hallucination, and it does run on Claude Code. And it's very new, so it's perfect for an AI journalist to want to test.

But of course, it didn't work, and then Mr. Edwards turned to ChatGPT...

Maybe you can rationalize it as getting confused about which code output which quotes and what guarantees there were supposed to be.
It's definitely FUBAR; but to me, there's plenty of reason to believe it was not intentionally malicious.

If someone is promising “only verbatim text and no hallucination” from an LLM, they’re lying. As this retraction illustrates.

If they were telling the truth, they’d be too busy swimming in their pool of VC money for their new legal analysis LLM (or alternatively, their big pile of equity from their LLM company of choice) to roll out random tools on the internet for free.

Leaving aside whether it’s doable at all, getting 99.9% reliable LLM output is the holy grail of the field and anyone doing it would be screaming it from the rooftops.
 
Upvote
61 (63 / -2)
Status
Not open for further replies.