From the article:Edit: I'm halfway though, it's a good read and is absolutely something people in this thread would be interested in.
Edit 3: Seriously, read it, it's basically a bunch of seemingly highly qualified individuals giving their takes on AI use in journalism with lots of citations and examples (I've got like 10 new tabs open and I'm not finished.
Ah yes, Impressionists -- famous for totally not starving. No parallels between "freeing" Impressionists from the drudgery of paying gigs and people's concerns about AI.I take solace in two historical parallels rooted in craft. When the camera was invented, painters feared for their jobs. But instead Impressionism emerged—a new form of painting not replicable through cameras. Photography liberated painters from realism, allowing a new focus on the subjective experience using visible brushstrokes and techniques that cameras couldn’t capture.
<snip>
The victim admits that, while the quotes were not his, the meaning was entirely something he might have said. As such, the fake quotes were not damaging. They are a theoretical issue more than anything.
It's not plausible to me. This is not their first rodeo. This isn't even unprecedented in the last 5 years.I think that's entirely plausible. I sincerely hope too much damage is not done in the meantime.Some form of "we will have more to say on this topic once we have completed our investigation, but that will take some time" should have been the last line of Ken Fisher's statement, and I don't think there was a good reason to omit it.
"We're not used to dealing with fuckups of this magnitude, and it just didn't occur to me to be that explicit" is about the best I can do.
This. If a YouTube video contains a bunch of DALL-E graphics, it's slop even if an actual human is in the video doing narration or something.Sure but that's not what happened here. There is nothing novel or deceptive about calling an article that contains sloppy ai generated content, ai slop.
Something like this is sort-of tenable with the "indie web" of small personal websites that still exist, e.g. Neocities.Every time I search for technical information, only to find a mess of auto-generated, referral-link infested slop promoted via SEO, I keep thinking of how a modern web directory could help - even if it only covered a small proportion of sites.
Ultimately, when looking for information, the best option is to go straight to a website that is likely to have the information to begin with. Astronomy gear? Skip the horrendously-incorrect slop sites full of random referral links and go straight to Cloudy Nights or Astrobin. Computer gear? Skip the search engine and look to see if Ars or TechPowerup has a review. Search is rapidly becoming worthless, and we need to fall back on trusting individual sites.
But how to know which sites even cover a topic to begin with? It used to be that you just went to Yahoo Directory or DMOZ to find websites on a particular topic. These days, you need to stumble across them. Returning to a directory-based approach could help with discovery.
But there would need to be some way to keep the slop factories out. If the directory was fully open and they could just spam their multitude of identikit sites in to every category, that would kill the directory as quickly as they killed web search.
Even better would be some kind of carefully-curated list ranked by reputation (as assessed by experts in the field). But getting experts to even agree in the first place would be a challenge.
It's an interesting thought, isn't it? Now I'm remembering a semi-serious/semi-joking paper I wrote in 2008 for an algorithmically generated evaluation system for determining the trustworthiness of social media accounts. I'm not a CS person but I knew there wasn't a good way to do it then and I mostly was trying to show original thinking for the class I was taking. Now I'm like, "Hmm. Something like that could probably be done with AI (or 'AI') to evaluate, categorize, and assign trustworthiness scores to sites in a web directory."Every time I search for technical information, only to find a mess of auto-generated, referral-link infested slop promoted via SEO, I keep thinking of how a modern web directory could help - even if it only covered a small proportion of sites.
Ultimately, when looking for information, the best option is to go straight to a website that is likely to have the information to begin with. Astronomy gear? Skip the horrendously-incorrect slop sites full of random referral links and go straight to Cloudy Nights or Astrobin. Computer gear? Skip the search engine and look to see if Ars or TechPowerup has a review. Search is rapidly becoming worthless, and we need to fall back on trusting individual sites.
But how to know which sites even cover a topic to begin with? It used to be that you just went to Yahoo Directory or DMOZ to find websites on a particular topic. These days, you need to stumble across them. Returning to a directory-based approach could help with discovery.
But there would need to be some way to keep the slop factories out. If the directory was fully open and they could just spam their multitude of identikit sites in to every category, that would kill the directory as quickly as they killed web search.
Even better would be some kind of carefully-curated list ranked by reputation (as assessed by experts in the field). But getting experts to even agree in the first place would be a challenge.
Hey guys. I appreciate your rapid response to this situation and the retraction is the right thing to do. However, you're not following optimal journalistic practices here. The article and its associated comments should not have been taken down (and should be restored). The article should've been prepended with a notification that it was being investigated, and again with the retraction once that was decided upon. Failing to leave the text of the article and the comments up is lacking transparency and not what I'd hope to see from a publication that I regard as highly as Ars.
In addition, Ars' LLM/AI policy needs to be further clarified. LLMs/AI tools should NOT be used for the planning or writing of articles in any way whatsoever. Using those tools for outlining purposes, as Benj says he has done, is not prohibited by the letter of your policy as shown in this retraction as far as I can see. The only exception I'd consider would be using an LLM as a glorified search engine to find links to actual sources. But the output of the LLM must not be used in any way aside from the links. And I think it's safer to just avoid them entirely.
Finally, we need a detailed post-mortem. How did this happen, and what changes are being made to prevent something similar from occurring again? While I understand that discussion of internal personnel matters is inappropriate, I do believe that this is serious enough to justify a change in staffing. Ars' credibility is on the line.
Context matters. According to Benj Edwards website, he is the Senior AI reporter. What does this mean?How would you want to be treated in a similar situation? Have you (and others) never made a professional mistake? And for those of you who have, and got the cold sweats when you realised, didn’t you swear to never let that happen again?
Journalism is so important, but nobody actually died from this mistake, and the scar tissue is invaluable.
No, no, no, the problem with all these AI debacles is that the people involved are not as Very Smart™ as I am. When I use AI I herd the cats with my own very special skills that no one else has.Here's my suggestion: Don't use AI tools to write, don't use them to "assist", don't even use them to summarize. A complete moratorium on AI writing or inquiries. Yes, of course I'd say this... but that this happened using AI was, frankly, inevitable. It's the nature of the tool, and it WILL happen again, even if it's writing is "proofread". The work it takes to verity each claim AI makes is better spent just doing that initial research and writing it with a human... by a human. You can if you wish grasp a whole other person I suppose, HR department may object.
Errors in judgement should be forgiven. Errors in morals should not be forgiven. I do not believe that Benj made an error in judgement, but instead he made an error in morals.Maybe I’m more lenient than most but I’m not mad at all about this. Errors in judgement happen all the time. I guarantee everyone commenting has had a serious mistake or two in their lives.
Wow! Talk about misunderstanding the purpose of a reporter! A reporter should select his words extremely carefully, to understand the impact of those words, and to trigger some type of emotional response in a reader.Basically the guys at the plain dealer record their interviews and then have AI write the story, which they supposedly verify and double-check. He says: "By removing writing from reporters’ workloads, we’ve effectively freed up an extra workday for them each week."
I guess the stakes are lower in local journalism, but if this is the modern pipeline that young journalists are supposed to go through in order to eventually land better jobs, the future is looking bleak.
Like I said, at no point did I say there was not a problem. I wholeheartedly agree with folks that this is a major problem and one that should be taken seriously by Ars. I don't think the author should have used an AI tool at all, even if he were not dealing with brain fog or sickness and were otherwise able to have caught this mistake. I agree that it is a perfectly reasonable expectation for journalists to just read blog posts themselves and copy any quotes they want to use manually, the old-fashioned way."Everything in the article except the quotes" is a huge problem. It's even a huger problem that the quotes were from a website that could easily been verified using a copy/find/paste. It seems like the person who was incorrectly being quoted took it in stride, but what if he didn't? Condé Nast is a huge company, and if the person didn't take the quote in stride, they could have sued. It shows negligence on Benj's part not to do a quick verification that would have taken just a minute.
The only way to avoid this is to just not use AI, which is pretty easy; we all managed to do it before a couple of years ago. Many of us still manage to do it today.
This also damages the likelihood of people wanting to be interviewed by Benj. For example, I wrote a book about AI. If Benj wants to interview me, would I let him? No. I do not trust that he's able to quote me without making mistakes.
I don't, actually. Do you have a link for that?That may have been a result of some closer over site of her work. Do you not remember the two articles she wrote early on (regarding NFT's if I remember correctly) that got torn to shreds in the comments similar to this one?
Once upon a time, this was the role of "sub-editors" or "copy editors". They would check the article for factual accuracy, readability and conformance to house style, and the article wouldn't go out the door until they were done.1) Why didnt the review process catch the quotations that didnt actually exist? The whole point of a review process is to verify this exact sort of thing. There's defintely some major issues with the review process that did not catch this before publication, especially knowing that the author is someone who uses AI tools in their workflow for purposes of learning.
Maybe The Register, if you're a gentleman of culture? Not more responsible but possibly similarly?Good luck with that.
Clearly, he used an LLM (which one? What's the tool name? We don't know, I can't find it) that falsely claimed to extract quotes from websites without questioning how such a tool could be possible.Context matters. According to Benj Edwards website, he is the Senior AI reporter. What does this mean?
1. Out of all the journalists on Ars' staff, he is the expert on "AI". He should know that LLMs, like Claude and ChatGPT, are probabilistic word generators . In fact, according to his website, he's the one who coined the term "confabulation". Clearly, he used an LLM (which one? What's the tool name? We don't know, I can't find it) that falsely claimed to extract quotes from websites without questioning how such a tool could be possible.
2. He's a reporter (another word for Journalist). I do not know if Mr. Edwards has a degree in journalism, but his Ars Technica bio says he has over 20 years of journalism experience. He knows to always check quotes, because of how quickly people are willing to sue in the United States. That is something that is taught in the first day of a journalism class.
3. He has a duty, as a journalist, to check all quotes. Checking that quote requires a simple copy/paste to the website. He clearly did not do it, even though it could quickly be done.
This is not a "learning opportunity" where he could move forward on the AI beat. Either he does not understand how AI works, and therefore he should not be on the AI beat, or he does not do basic due diligence after 20 years of working as a journalist, and in that case he should not be a journalist, because he'll be too much of a liability for his employer.
I also question the "I have a fever and I tried using a new AI tool". When people have a fever, they revert back to the processes they are most familiar with, because their brains have less ability to learn new things. And even if a tool was used, it is quick to copy and paste the quote into the "find" bar in a web browser to confirm the quote is real.
My uni is otherwise inclined, so reading full text end to end is not happening. I don't have space for all the violins in my head.
The idea is to use agentic LLM to do the reading and report back, with zero hallucination guarantee, the quotes. To make sure I get that guarantee, I get the LLM to bring me the IDs and scores for quotes. Quote text itself is always pulled verbatim from the database, never generated.
I'm curious, what is left to investigate?Nearly the entirety of this thread has been emotional. Anger at how Ars has handled the retraction. Expressions of feeling "betrayed" by being fed misinformation. And most disturbingly, in my opinion, calling for the summary dismissal of an employee without a complete and thorough investigation.
This thread has become a horde of villagers with pitchforks standing outside of Frankenstein's castle, with a small contingent of other villagers at the gate saying "well, let's just wait a moment and evaluate this."
One of my favorite things about Ars is that they're often days behind the media frenzy to get a story out. I like to think they use that time to let the story cool off and develop interesting insights. I seriously hope that Ars management is not pushing writers just to get stuff out the door. I see it as a badge of honor for Ars to take days to put out an in-depth take.Once upon a time, this was the role of "sub-editors" or "copy editors". They would check the article for factual accuracy, readability and conformance to house style, and the article wouldn't go out the door until they were done.
However, in the rush to publish more articles at a higher cadence, the art of copy editing has pretty much disappeared. Even reputable newspapers often outsource it now to third-parties (for example, one company dominates the space here in Australia, with in-house subs being a rarity). There's a general consensus that these outsourced copy editors don't have the care, diligence or understanding of genuine copy editors, and are focused more on style than accuracy.
Conventional wisdom would consider this kind of editorial role as a prime candidate to be outsourced to LLMs entirely. But it seems to me that with increasing adoption of fallible models, human editorial oversight should be more important than ever.
A promise to not use AI in direct and specific violation of policy again, plus two bucks, will get me a nice coffee at Starbucks.These three things are known, and I believe they are enough to make a decision on Benj's fate. The question becomes: What does "not permit" actually mean? I'd be happy with a promise not to use AI again and a breakdown of how things will change at Ars Technica as a result.
Which ironically is how this whole situation started in the first place.Ah, yes, the copy editor. Haven't heard that phrase in years, perhaps decades. No- one's willing to pay for what is perceived as a QC role.
The publishing equivalent of having someone else check your code.
You're absolutely right, it's not an issue. In the USA, he will have to show harm if defamation happens, which clearly didn't happen...As for lawsuits, I don't really think that is an issue; critically, one of the main requirements for defamation is reputational harm, but IIRC the hallucinated quotes did not harm the subject's reputation .
Clearly, he used an LLM (which one? What's the tool name? We don't know, I can't find it) that falsely claimed to extract quotes from websites without questioning how such a tool could be possible.
I searched with kagi for "claude code pull quote extractor" and found a tool that seems to fit Benj's description: https://github.com/nixlim/academic-quote-extractor
And here's a Reddit post of a student who used it instead of reading the assigned book.
View: https://www.reddit.com/r/ClaudeAI/comments/1qqtmct/academic_quote_extractor_cli_tool_for_pulling/
Scores? Why would a quote need to be "scored"? Scored against what, and what is "relevance" in this case?To make sure I get that guarantee, I get the LLM to bring me the IDs and scores for quotes
Scored for relevance to the prompt, maybe?Scores? Why would a quote need to be "scored"? Scored against what, and what is "relevance" in this case?
I agree with this. I cannot understand why so many people are siding with the decision to remove the article based on "LLM accuracy." We are in this position precisely because LLMs are fundamentally inaccurate; they already hallucinate and fabricate by design. Leaving a retracted article online with the proper context won't make these models any less reliable than they already are, but it is a major win for transparency and letting humans make their own judgment calls.I struggle to understand all the people who keep arguing that keeping the article online could be a negative because it could cause other LLMs to have access to bad information. The internet is FULL of wrong information. No person should ever let their actions be dictated by how it might effect the giant moneysink bullshit generators that are current LLMs. I would go so far as to say it might even be NOBLE to poison LLMs. If Sam Altman and his ilk cared about ensuring accuracy, maybe they shouldn't train their products on a torrent (no pun intended) of unfiltered and unverified content scraped en masse.
Either way, whatever action is to be taken should be judged on the merits it has for HUMANS, not products. Does the article staying up with proper context as a monument to a mistake serve the readership and the website better than the article being deleted? Is the subject of the article materially harmed by keeping the article up, or do they also oppose deletion? (hint hint, Ars should be asking Scott Shambaugh their opinion on this). Those are the questions to be asked. The fact that so many people even let concern for LLM 'quality' come to mind as relevant is quite frankly terrifying.
I've enjoyed Kyle Orland's work over the years but I won't trust it going forward and honestly will be moving on from Ars as well. This is not the first issue with intergrity they've had and at some point as a reader you have to admit there is something broken with the culture regardless of how much you enjoy the content.
What good is a promise not to do something when that promise has already been made and broken once before?I'm curious, what is left to investigate?
1. Did he misquote someone? Yes.
2. Did those misquotes come about through the use of AI tools? Yes.
3. What is Ars Technica's stance on using AI material in its articles? Ken Fisher, Editor in Chief, says: "Ars Technica does not permit the publication of AI-generated material unless it is clearly labeled and presented for demonstration purposes."
These three things are known, and I believe they are enough to make a decision on Benj's fate. The question becomes: What does "not permit" actually mean? I'd be happy with a promise not to use AI again and a breakdown of how things will change at Ars Technica as a result. I also think it would be fair for Ars to part ways with an employee who broke fundamentals of journalism in such a stupid manner. But what we need most is a clearly made decision. Right now, it feels like... maybe the case is already closed? Certainly nobody has said it is ongoing. There has been no actual changes expressed. No punishments detailed. Do we just have to wait a week or two to notice that the writer in question hasn't written an article since this incident?? I kinda feel like we're owed a bit more than that!
There's also 4. Has AI really, really never been used in any other articles? That one is much harder to pin down and doing so would take significant time and ultimately still leave some unknowns. Aside from verifiably incorrect quotes, AI output kinda looks like human output and there's no good way to be sure which is which. Which is why the rule to not use AI in articles exists in the first place! Once you get caught, all your other work is suspect.
If you incorrectly think that I am the one who introduced the soccer yellow/red card metaphor to describe Mr. Edwards' apparent misconduct and how Ars Technica might seek to discipline him, perhaps you unwisely used an AI tool to summarize the comment section up to that point.A red card does not mean booting the player from the team permanently - unless it was the result of deliberate malice.
No team - and you damn well know that - would boot a player permanently for getting a red card in a match.
Touché.Even bolding key points of quotes in the forums gets a textual slap from Ars with threats of bans for “manipulating” quotes. We should keep that in mind when we observe the response to this
I’m replying with 4 more pages of comments left for me to read but I can’t help it. At the time of this reply there are 21 downvotes to this comment. I can’t understand it. I think that the image of a crazed mechanic is less a professional betrayal than a journalist falsifying a quote. It is a fundamental part of a journalist’s responsibility to verify their quotations. I believe that the journalist should often go further and ask if the quoted person has a further comment.If a car mechanic took a hammer to your engine block and then said "I was sick" when confronted with what he did, would you ever take your car back to that mechanic?
well now that they got caught they'll make extra-certain they don't get caught againWhat good is a promise not to do something when that promise has already been made and broken once before?
Your error is in thinking it will be a matter of personal choice. Stylometry and metadata triangulation combined with AI being cheap for the consumer and the recursive nature of AI training mean that the internet will not be anonymous for long at all. But the repercussions of clawbots mean that eventually someone will have to take legal responsibility for their actions.Ok. Go ahead, what's your name and address? Be the change you want in the world.