Deloitte will refund Australian government for AI hallucination-filled report

Kyle Orland

Ars Praefectus
3,434
Subscriptor++
Genuine question - that makes me question the value of the summaries. How can we know the summaries are correct without reading the paper itself? Is there any research on not just lost nuance, but the hallucinations in AI summaries? I'd be interested in seeing it across approaches, such as NotebookLM and Kagi, which can pin to a set of sources, or requests to summarize a single paper across different models.

I occasionally use AI to summarize things, but I don't trust it past summarizing things where the ultimate goal is to point me to the actual authoritative source when I'm having a hard time finding it, so I can verify the summary. Do you trust the summaries you get? And if so, why?
Funnily enough, the Australian government studied this and found that AIs were worse than humans at summarizing!

https://meincmagazine.com/ai/2024/09/...-ai-is-much-worse-than-humans-at-summarizing/
 
Upvote
59 (59 / 0)

norton_I

Ars Praefectus
5,813
Subscriptor++
This. I absolutely cannot understand why a consultancy, who's entire business model is "pay us large sums of money for our expert's advice" would rely on a LLM for even as much as grammar advice.

If your expert is ChatGPT, why do I pay you? I can write prompts myself. This is an incredibly fast way to sink your entire business model- if I were McKinsey or one of the others I'd be out there advertising "We know what we're doing, we don't need AI to do it poorly"

Except Deloitte and McKinsey's main product is the laundering of responsibility, not their expertise. It's about having someone to blame rather than getting results. So using chat-GPT in house doesn't provide that. On the other hand: if big name accounting and consulting companies can use chat-GPT to cut their own costs, they will.
 
Upvote
49 (49 / 0)

Alethe

Ars Centurion
240
Subscriptor
I'm not familiar with the Australian legal system, but they should actually have to pay a penalty multiplier for fraud, in other words pay more than they received for the fraudulent report. It also appears that they started with the conclusion that they wanted, and are claiming that their fraud does not alter the conclusion.
Agreed. A partial refund merely lowers the incentive to do it again. What's really required is a financially painful fine to act as deterrent. For these clowns and others.
 
Upvote
27 (27 / 0)

graylshaped

Ars Legatus Legionis
67,926
Subscriptor++
"We stand by our completely trustworthy work for its accuracy except where it isn't because we made shit up," says lazy, lying consultants after being caught for their lazy lying.

Sounds like "AI" shills in general. "This stuff is mostly perfect!" remains the refrain. In this case, "It's only ten percent utter bullshit."
 
Upvote
25 (25 / 0)

norton_I

Ars Praefectus
5,813
Subscriptor++
Anybody thinking about shorting Deloitte?

No, because like the rest of the big-4 they are a privately held company. Historically, accounting firms (like law firms) were prohibited from forming corporations because as a profession with a binding code of ethics when they audited a financial report, they were personally attesting that the work was done correctly according to the standards of the profession. The owners (partners) retained unlimited personal liability for the professional conduct of the entire organization, and thus were not allowed to form a corporation that would limit the owners liabilities.

Of course that was a long time ago. Now we have determined that making people responsible for misconduct is crazy nonsense as long as they do it in the service of making rich people richer. So we have created structures that basically completely limit their liability. But these companies are still privately held because of that historical artifact.
 
Upvote
42 (42 / 0)

terrydactyl

Ars Tribunus Angusticlavius
7,886
Subscriptor
5. Pants were lost.

Given that Deloitte's business model is to hire a whole bunch of kids right out of college, kinda supervise them a little bit, and charge boatloads of money for whatever flows out, this result is entirely unsurprising. And this is far from the first time that shoddy work has gotten them in trouble
Yet governments and corporations keep using Deloitte. It's as if they are as stupid as the LLMs.
 
Upvote
19 (19 / 0)

John Abbe

Smack-Fu Master, in training
53
Interviewer: This report that Deloitte produced for Australia's Department of Employment and Workplace Relations this week...

Deloitte executive: The one with the false citations?

Interviewer: Yeah.

Deloitte executive: Yeah, that’s not very typical, I’d like to make that point.

Interviewer: Well, how was it un-typical?

Deloitte executive: Well there are a lot of these reports going around the world all the
time, and very seldom does anything like this happen. I just don’t want people
thinking that Deloitte's reports have false citations.

Interviewer: Did this report have false citations?

Deloitte executive: Well, I was thinking more about the other ones.

...
 
Upvote
59 (59 / 0)

jdale

Ars Legatus Legionis
18,335
Subscriptor
While I am in favor of sticking it to people who use AI like this in any way possible, I think you'd probably find it hard to make the case that it's "defaming" her. Unless the made up papers are about her doing experiments on humans or something of that magnitude. While it's wrong, if someone cited a fake paper I wrote on an inconclusive drug trial, it'd be hard to show that it had somehow damaged my reputation.
Perhaps there's a claim under right of publicity or fraud.

It's unfortunate that the law is relatively tolerant about falsehoods, even when they are deliberate or (as here) clearly negligent. That's left us very unprepared for the modern age.
 
Upvote
16 (16 / 0)

ashypans

Wise, Aged Ars Veteran
101
Subscriptor
LLMs also seem to have a prepensity to mis-cite. Those are a lot harder to identify obviously but several colleagues are seeing one or two of their papers surge with citations that don't exactly match the topic. Its not universal, most of their papers aren't getting hit, but one or two just have the magic je ne sais quoi to set off those LLMs. Makes me think there may be a way to put out a few LLM honeypot papers into the ecosystem if we could figure out the right conditions.
 
Upvote
24 (24 / 0)

trevor_darley

Wise, Aged Ars Veteran
161
the substance of the independent review is retained, and there are no changes to the recommendations.
I developed a cancer-style staging system for the spread of GenAI adoption.

Stage 0: Tech CEOs
Stage 1: Tech enthusiasts
Stage 2: The general public
Stage 3a: Respected professionals (like nurses)
Stage 3b: Highest respected individuals (like doctors)
Stage 4:

I hadn't quite figured out Stage 4 yet before this article. I already knew we were at stage 3a because I saw nurses talking about how wonderful ChatGPT is, but now I know we're at Stage 4, and it includes governments. Great.
 
Upvote
15 (15 / 0)

Tobold

Ars Tribunus Militum
2,003
Subscriptor++
I want to be fair about this, because I generally think ChatGPT is a useful tool for lit searches and summaries of papers (as with any summary, some nuance is lost). However, once I asked it for sources on a certain topic and it responded with hallucinated papers. My first clue that something wasn't quite right was when one of the papers (of which I was not previously aware) listed me as the first author...
The problem with counting on LLMs for lit searches is that they often fail on a very key point, such as "Does this paper support or reject a particular hypothesis?". It can write a nice paragraph summarizing the paper, but is far too easily confused on the key points because it can't really distinguish key points from secondary ones.

If you want paragraphs of nonsense, LLMs are a great tool. If you want understanding, they are no substitute for some reading.
 
Upvote
27 (27 / 0)

Emmanuel Deloget

Wise, Aged Ars Veteran
136
Subscriptor
Don't these management consulting companies run on fresh-out-of-college "consultants" who have zero or close to zero real world experience?

ETA ninja'ed


The sarcastic answer would be: as the target audience is made of technocrats, I'm not sure that they are equiped to assess any kind of real world experience anyway :)
 
Upvote
6 (6 / 0)

iquanyin

Ars Tribunus Militum
2,073
"It is difficult to get a man to understand something, when his salary depends on his not understanding it." Attributed to Upton Sinclair, but unverified.
i usually see it attributed to mark twain. (also, speaking of upton, people might want to (re)read The Jungle again. that was his novel about the meat packing industry that aimed to help meat packing workers but instead got the president to start the FDA. seems timely, what with rfk, jr and all that.
 
Upvote
9 (9 / 0)

Cthel

Ars Praefectus
9,869
Subscriptor
Interviewer: This report that Deloitte produced for Australia's Department of Employment and Workplace Relations this week...

Deloitte executive: The one with the false citations?

Interviewer: Yeah.

Deloitte executive: Yeah, that’s not very typical, I’d like to make that point.

Interviewer: Well, how was it un-typical?

Deloitte executive: Well there are a lot of these reports going around the world all the
time, and very seldom does anything like this happen. I just don’t want people
thinking that Deloitte's reports have false citations.

Interviewer: Did this report have false citations?

Deloitte executive: Well, I was thinking more about the other ones.

...
So you're saying that these reports are written to very rigorous professional standards? No paper or paper derivatives Markov chains or Markov chain derivatives?
 
Upvote
6 (6 / 0)

SeanJW

Ars Legatus Legionis
11,892
Subscriptor++
Interviewer: This report that Deloitte produced for Australia's Department of Employment and Workplace Relations this week...

Deloitte executive: The one with the false citations?

Interviewer: Yeah.

Deloitte executive: Yeah, that’s not very typical, I’d like to make that point.

Interviewer: Well, how was it un-typical?

Deloitte executive: Well there are a lot of these reports going around the world all the
time, and very seldom does anything like this happen. I just don’t want people
thinking that Deloitte's reports have false citations.

Interviewer: Did this report have false citations?

Deloitte executive: Well, I was thinking more about the other ones.

...

The front fell off?
 
Upvote
12 (13 / -1)

happy_heyoka

Wise, Aged Ars Veteran
114
Subscriptor++
"Deloitte's "Targeted Compliance Framework Assurance Review"...focuses on the technical framework the government uses to automate penalties under the country's welfare system.".

I am not assured that the penalties will be compliant. Take them to court if you can.
Sounds like Robodebt v2.0 is in preparation.

That worked out so well the first time 🫤
https://www.google.com/search?q=fallout from robodebt
 
Upvote
9 (9 / 0)

Shiunbird

Ars Scholae Palatinae
740
Earlier this year, Deloitte declared it would start using generative AI for its reports as a way of enhancing the value provided to its clients. I don't remember if they said it in a specific report or not, but I recall seeing it.

The citation issue continues to trip people up across the spectrum, from lawyers to business analysts. It's striking how many supposedly smart people do not understand the limits of the tools they insist will deliver such amazing value.
Greed-coated lenses are 100% opaque.
 
Upvote
13 (13 / 0)

Shiunbird

Ars Scholae Palatinae
740
It looks like AI might destroy the consultancy industry. Why pay millions to a fancy consultant when one can ask an LLM to crank out an equally worthless report? Management pays consultants to justify decisions they have already made and provide a way to deflect blame when they go awry. It sounds a lot cheaper to fire up an LLM, get the nonsense one wants and have a dumb computer to blame.
And the beautiful revolving door:

1. Make a shitty decision, blame the consulting company. If you are golden-parachuted nevertheless, you get a consulting job.
2. Reciprocate the safety cushion by guaranteeing a high level job for any consultant that has "served" your company, shielding them from any consequences from bad decisions.

Tale as old as billable hours.
 
Upvote
13 (13 / 0)

JoHBE

Ars Praefectus
4,224
Subscriptor++
Genuine question - that makes me question the value of the summaries. How can we know the summaries are correct without reading the paper itself? Is there any research on not just lost nuance, but the hallucinations in AI summaries? I'd be interested in seeing it across approaches, such as NotebookLM and Kagi, which can pin to a set of sources, or requests to summarize a single paper across different models.

I occasionally use AI to summarize things, but I don't trust it past summarizing things where the ultimate goal is to point me to the actual authoritative source when I'm having a hard time finding it, so I can verify the summary. Do you trust the summaries you get? And if so, why?

But...but... Reading the summary is MORE EFFICIENT!!!
 
Upvote
10 (10 / 0)

Teom

Wise, Aged Ars Veteran
188
The report, which cost Australian taxpayers nearly $440,000 AUD (about $290,000 USD), focuses on the technical framework the government uses to automate penalties under the country's welfare system.
Is the purpose of this framework to save taxpayer money by improving the penalizing of welfare recipients who either aren’t eligible or didn’t follow the rules when applying? If so then how many welfare recipients need to be penalized to make up $440.000 AUD? I’d really like to see that calculation.

Also happy labour day Aussies!
 
Upvote
26 (26 / 0)

Cassius Kray

Ars Praetorian
404
Subscriptor
Yeah, I was skeptical. I baselined this by using ChatGPT to summarize papers I'd already read, or on topics I'm already an expert in. Usually the summary was a re-wording of the abstract, with maybe some additional context from the paper. I've found these summaries to be pretty good, but you have to check.

In the case I mentioned, the statement ChatGPT made about the topic is very probably correct, but the sources for the correct statement were made up.

This is a theme -- I've asked ChatGPT how to make calculations using a code that is popular in my field (and which I've used for about a decade), just as a curiosity. It is correct about what the code ought to able to do, and it is even correct about how the code would do it, but it totally makes up the keywords you'd need to set in the input file to make this happen.

Quite bold of Deloitte to use it without checking. It's a trivial thing to do. If I were a consulting firm in the age of generative AI, I would want to be very clear on how I add value to AI -- here it is deeply unclear how Deloitte did.

Your process of 'baselining' exposes a fundamental flaw in many people's use of GenAI - potentially the same attitude that lead the authors of Deloitte's report to inserting a load of random citations.

The fact that you've checked a few summaries and they looked good should give you absolutely no confidence about the likelihood of future summaries being entirely accurate. That's simply not how current GenAI works. The risk of hallucination is inherent and continues to exist even in situations where the output is often correct.

While you can put in the effort to double-check that output is a representation of the text, how can you possibly know that nothing important has been missed without reading the whole thing?

If you need an accurate summary you can't trust GenAI. And if you're happy to accept the risks that the summary isn't accurate, then you don't need the summary.
 
Upvote
36 (36 / 0)

Fatesrider

Ars Legatus Legionis
25,121
Subscriptor
Perhaps Lisa Crawford has a case for defamation or slander for having these false papers attributed to her. One way to stop the nonsense is to make it hurt. As it airs they are partially refunding the money but clearly all they did is engineer a few AI prompts to get the report. Make them refund it all, make them pay for defamation and send a message that this crap isn’t okay.

Same for the lawyers who submit briefs to the court with fake legal citations.
Yeah, law firms who allow this stupid shit have money and lazy attorneys. Fining them is just the cost of doing business.

Hit them in their livelihood. Suspend the filing attorney's license until they retake the bar exam.

Specifically the parts focusing on ethics and the limitations of AI (assuming they have it, if not, make that part and add it).

Hit THEM in their ability to make a living and the practice would slow (but not stop). The only way to get it to stop would be a three strikes rule. And by "strikes", I mean total citations, not briefs with bad citations, in a year. If they let one or two pass in a year, they retake the bar for each one. Three? Disbarred for life.

Even if its in the same filing.

Eventually, the law firms would realize that AI isn't worth the risk and actually do their fucking jobs for a change by banning using AI at all, which is the whole point to the punishment.
 
Upvote
11 (11 / 0)

Wheels Of Confusion

Ars Legatus Legionis
75,589
Subscriptor
Why do people apparently hate thinking for themselves so much that they immediately abdicate that responsibility to an algorithm at the first opportunity?

Agent Smith was right
Not necessarily "people" in this case, but "a consultancy." And the answer is because it was cheaper for them to do it, if they got away with it.
 
Upvote
10 (10 / 0)

niftykev

Ars Scholae Palatinae
754
This has not been a good time for Deloitte; this mess with AI and the mess their former CEO is making in the WNBA. Anybody thinking about shorting Deloitte? For a professional services organization, using AI to do professional work, or the notorious work of its former CEO Cathy Engelbert sure isn't helping.
Deloitte does a good job with the UEFA Money League webpage detailing the revenue of the top football clubs in Europe.

Of course, it'll probably terrible if they let an LLM do anything with it.
 
Upvote
-8 (0 / -8)

Chuckstar

Ars Legatus Legionis
37,290
Subscriptor
View attachment 119666
As true as it's ever been.


Fun story involving non-Deloitte consultants: I was vehemently opposed to a project being forced through at my last company, not because I thought the goal was bad (in fact it was actually a good goal, if the process of implementation was done by competent people), but because the consulting company and integrator for some of the stuff was setting off all sorts of alarm bells in my talks with them. One of my last major acts of resistance was putting together a slideshow since execs love those, and giving a succinct list of issues that were going to be encountered if they took this approach, and the upfront and predicted long-term effects of those issues.

The astute reader will guess that yes, they did proceed with reckless abandon, shortly after I departed that company. One of my coworkers kept a copy of the presentation, and as they ran into each issue, annotated that with the date they did run into it, and some little screenshots from slack and emails of people freaking out, and then when it was all done, he realized it was 12 slides long, and printed it as a calendar and sent it to me :ROFLMAO:
I would just say that “consulting” is such a broad term, that there is room for all kinds of outcomes. At one company I worked at, they brought in a consultant to make our undergraduate recruiting process better. The process they came up with worked great, but only for the first year after it was introduced. The problem was that the process required training and coordination among the people assigned to interview undergraduates each year. The company only bothered with the training that first year. Didn’t even bother providing the useful pre-printed forms the following year.

The consultant did great. The company dropped the ball.

For anyone curious, what they suggested was that when we brought candidates in for second round interviews, where they’d meet six people, instead of each person asking similar surface-level questions, each interviewer be assigned a topic to dig into: quantitative skills, educational background, ability to work with a team, etc. That allows for full coverage — you make sure there isn’t a candidate where no one got around to asking about quantitative skills, for instance — and allows for digging deeper into each topic. Also makes comparisons easier, since the same interviewer will have asked each candidates about quantitative skills. As recruiters, we found it much more useful. Even candidates found it a better process, as they felt like they had been fully evaluated, and not just judged base on on similar superficial questions asked by six different people (plus it’s just mind-numbing to be asked the same superficial questions in six interviews over three/four hours).

The long-term problem is that all the interviewers have to understand the process we’re trying to accomplish and take the time before-hand to coordinate who is assigned which topic. Without that preparation, the whole thing easily falls apart.

The same company had also brought in consultants to build a Visual Basic based set of plug-ins for Excel and PowerPoint to make presentations easier to build, more consistently formatted and easier to move slides between presentations. It cost a fortune (although not that much when compared to how much less time people took messing with formatting and simply how much better presentations looked), but I remember thinking “wow… this is really how PowerPoint should work. Microsoft should just rebuild PowerPoint like this.”

EDIT: I could also provide a list of failed consultant projects, though. And like in your post, that we all knew would fail, for whatever specific reasons.
 
Upvote
14 (14 / 0)

Zeppos

Ars Tribunus Militum
2,915
Subscriptor
Sorry for venting guys, but I have had it with these consultants. Once talked to a few Deloitte guys at a job fair. We found out in a few minutes we were not a match. My God what a bunch of presumptuous assholes. As we say here in Dutch, they were dropped upwards. Corporate speak on steroids to package ordinary ideas a local farmer can come up with. Lots of wrapping, little substance. Why am I not surprised they used an Ai LLM tool to help them wrap things up. It is perfectly suited for that. Maybe they need to buy more expensive suits to hide their not that unordinary intelligence even more.
 
Upvote
23 (23 / 0)
Upvote
18 (18 / 0)
Earlier this year, Deloitte declared it would start using generative AI for its reports as a way of enhancing the value provided to its clients. I don't remember if they said it in a specific report or not, but I recall seeing it.

The citation issue continues to trip people up across the spectrum, from lawyers to business analysts. It's striking how many supposedly smart people do not understand the limits of the tools they insist will deliver such amazing value.
How much of that "amazing value" is offset by the negative PR and refunds to clients after the AI use and consequent hallucinations are made public?
 
Upvote
10 (10 / 0)