Deloitte will refund Australian government for AI hallucination-filled report

Status
You're currently viewing only struwell's posts. Click here to go back to viewing the entire thread.

struwell

Seniorius Lurkius
46
I want to be fair about this, because I generally think ChatGPT is a useful tool for lit searches and summaries of papers (as with any summary, some nuance is lost). However, once I asked it for sources on a certain topic and it responded with hallucinated papers. My first clue that something wasn't quite right was when one of the papers (of which I was not previously aware) listed me as the first author...
 
Upvote
-10 (49 / -59)

struwell

Seniorius Lurkius
46
Genuine question - that makes me question the value of the summaries. How can we know the summaries are correct without reading the paper itself? Is there any research on not just lost nuance, but the hallucinations in AI summaries? I'd be interested in seeing it across approaches, such as NotebookLM and Kagi, which can pin to a set of sources, or requests to summarize a single paper across different models.

I occasionally use AI to summarize things, but I don't trust it past summarizing things where the ultimate goal is to point me to the actual authoritative source when I'm having a hard time finding it. Do you trust the summaries you get? And if so, why?
Yeah, I was skeptical. I baselined this by using ChatGPT to summarize papers I'd already read, or on topics I'm already an expert in. Usually the summary was a re-wording of the abstract, with maybe some additional context from the paper. I've found these summaries to be pretty good, but you have to check.

In the case I mentioned, the statement ChatGPT made about the topic is very probably correct, but the sources for the correct statement were made up.

This is a theme -- I've asked ChatGPT how to make calculations using a code that is popular in my field (and which I've used for about a decade), just as a curiosity. It is correct about what the code ought to able to do, and it is even correct about how the code would do it, but it totally makes up the keywords you'd need to set in the input file to make this happen.

Quite bold of Deloitte to use it without checking. It's a trivial thing to do. If I were a consulting firm in the age of generative AI, I would want to be very clear on how I add value to AI -- here it is deeply unclear how Deloitte did.
 
Upvote
34 (42 / -8)
Status
You're currently viewing only struwell's posts. Click here to go back to viewing the entire thread.