Did ChatGPT help health officials solve a weird outbreak? Maybe.

Uncivil Servant

Ars Scholae Palatinae
4,667
Subscriptor
Dr Mole said:
Some of the questions are easy enough to answer without a chatbot. A simple search on PubMed, a federal database of scientific literature, quickly pulls up examples of Salmonella being found in ice, for example.

Thank you! Every time I hear about people wanting to use LLMs as a medical search engine I wonder if I've spent the past two decades hallucinating this tool that we already have!

And it matters. Infectious disease specialists, epidemiologists, and gastroenterologists are going to have a much better grasp of the essential context. A chatbot that pulls some answer out of the aether is useless because I need to know that source!
 
Upvote
364 (369 / -5)
Post content hidden for low score. Show…

Happy Medium

Ars Tribunus Militum
2,147
Subscriptor++
This is the incredible danger of current LLM models. They use incredibly compelling language to assert confidence where the system itself literally IS NOT CAPABLE OF. Yes, the LLM said that ice was a "credible and likely" source, but ChatGPT isn't really able to do that, what it is doing instead is predicting that the words credible and likely are the most appropriate next words in a response!

Even if you know this is the major flaw of LLMs, it's really easy to fail to correct for that false assertion of confidence. Humans are creatures of language, and we're "programmed" to interpret confident language as evidence of knowledge and expertise. Even experts in the field (and TBH a health department should be an expert in public health outbreaks) can obviously be fooled to rely on LLM assertions because of this.
 
Upvote
302 (308 / -6)

charliebird

Ars Tribunus Militum
2,355
Subscriptor++
I've had a few ongoing, very minor medical issues that I've mentioned to doctors with no success (Seborrheic dermatitis is one I've had for years and years). They usually shrugged their shoulders and said, "That’s weird," and didn't offer a helpful suggestion. I gave the symptoms to ChatGPT, and it diagnosed the problem right away and suggested an over-the-counter treatment which worked. It was honestly pretty amazing. I’m not saying this is a substitute for real doctors, and I’m sure a specialist would have diagnosed the same thing. But as a supplement to medical professionals, there’s value, I reckon.
 
Upvote
-24 (88 / -112)

KingKrayola

Ars Tribunus Militum
1,619
Subscriptor
This is the incredible danger of current LLM models. They use incredibly compelling language to assert confidence where the system itself literally IS NOT CAPABLE OF. Yes, the LLM said that ice was a "credible and likely" source, but ChatGPT isn't really able to do that, what it is doing instead is predicting that the words credible and likely are the most appropriate next words in a response!

Even if you know this is the major flaw of LLMs, it's really easy to fail to correct for that false assertion of confidence. Humans are creatures of language, and we're "programmed" to interpret confident language as evidence of knowledge and expertise. Even experts in the field (and TBH a health department should be an expert in public health outbreaks) can obviously be fooled to rely on LLM assertions because of this.
I guess if you use it like a crappier wikipedia, it can be a good source of references if you ask for them and do a bunch of reading of said references yourself.

I've started to use LLMs to keep me out of internet rabbit holes but I wouldn't use them for much more than 'internet averaging'. Sounds like these officials did the same as part of a brainstorm?
 
Upvote
53 (60 / -7)

Bongle

Ars Praefectus
4,461
Subscriptor++
Confirming hypotheses seems like a really rough use of LLMs. When it's a yes/no answer, then it's just predicting plausible words. Combined with the makers' tendency to make them as sycophantic as possible, it's not a good use of the tech.

I find them alright at generating hypotheses, as long as they're not too costly to evaluate.
 
Upvote
113 (114 / -1)

danbert2000

Ars Praetorian
560
Subscriptor++
The note at the end about verifying the results taking just as much time as doing it yourself is so true. As a developer, I've had mixed success with giving GitHub Copilot too much to do at once because it will confidently finish a broken solution instead of pumping the brakes like a human might do to check assumptions. I've had it save me hours of work and had it spit out code that needed just as much time to fix as it would have to write myself. The issue is you don't really know until you thoroughly check it yourself, so you either give it unearned trust or you verify everything and gain little to no productivity. Looks like public health and science usage is similar.
 
Upvote
163 (164 / -1)

michaeltherobot

Wise, Aged Ars Veteran
181
I can definitely see the use of LLMS to help point someone in the right direction. Sure, PubMed another tools may be available if we simply go use them, but sometimes honestly my brain locks up and I can't even think of where to find sources. Brainstorming with an LLM and then demanding it cite its sources for me to check is, at least for me, an upgrade over raw Googling.
 
Upvote
21 (35 / -14)

ArsMetaluna

Smack-Fu Master, in training
98
Thank you! Every time I hear about people wanting to use LLMs as a medical search engine I wonder if I've spent the past two decades hallucinating this tool that we already have!


And for this case in particular. I can't believe the collective County Health Officials didn't all recoil in horror when they heard the words, "... reportedly hosed off ..." That phrase is a sure butt-clencher.
 
Upvote
82 (82 / 0)

Uncivil Servant

Ars Scholae Palatinae
4,667
Subscriptor
I've had a few ongoing, very minor medical issues that I've mentioned to doctors with no success (Seborrheic dermatitis is one I've had for years and years). They usually shrugged their shoulders and said, "That’s weird," and didn't offer a helpful suggestion. I gave the symptoms to ChatGPT, and it diagnosed the problem right away and suggested an over-the-counter treatment which worked. It was honestly pretty amazing. I’m not saying this is a substitute for real doctors, and I’m sure a specialist would have diagnosed the same thing. But as a supplement to medical professionals, there’s value, I reckon.

This is extremely dangerous for reasons that may not be immediately apparent. Because LLMs only preserve the connections between word-pieces, and not the actual meaning, you can have situations where an LLM considers "hyper" and "hypo" to be words that are both associated with a given suffix. And then it will take that suffix and determine the next token.

I have health issues that result in hypotension. Think about how an LLM is going to parse that, given that any training information is going to have at least 10:1 instances of hypertension to hypotension. So, how do I ensure that it picks up only connections between the whole word hypotension and not the token "tension"?

Now, zooming out, consider health issues where medical literature is disproportionately taken from male subjects, which is basically every specialty except OB/Gyn, and consider whether a woman should even consider using an LLM for medicine.
 
Upvote
193 (201 / -8)

issor

Ars Praefectus
5,621
Subscriptor
This is the incredible danger of current LLM models. They use incredibly compelling language to assert confidence where the system itself literally IS NOT CAPABLE OF. Yes, the LLM said that ice was a "credible and likely" source, but ChatGPT isn't really able to do that, what it is doing instead is predicting that the words credible and likely are the most appropriate next words in a response!

Even if you know this is the major flaw of LLMs, it's really easy to fail to correct for that false assertion of confidence. Humans are creatures of language, and we're "programmed" to interpret confident language as evidence of knowledge and expertise. Even experts in the field (and TBH a health department should be an expert in public health outbreaks) can obviously be fooled to rely on LLM assertions because of this.
I agree with your overall point but one clarification, it is not responding with “credible and likely” just because that is statistically a good response in general. Otherwise it would never respond in the negative, which it does (ask if it is possible if the salmonella is present due to spontaneous generation). It is generating the statistically likely response based on the input provided.

I’ve done my fair share of stupidly arguing with LLMs that I can tell are leading me on a wild goose chase.
 
Upvote
-5 (18 / -23)

DrCreosote

Wise, Aged Ars Veteran
131
This is the incredible danger of current LLM models. They use incredibly compelling language to assert confidence where the system itself literally IS NOT CAPABLE OF. Yes, the LLM said that ice was a "credible and likely" source, but ChatGPT isn't really able to do that, what it is doing instead is predicting that the words credible and likely are the most appropriate next words in a response!

Even if you know this is the major flaw of LLMs, it's really easy to fail to correct for that false assertion of confidence. Humans are creatures of language, and we're "programmed" to interpret confident language as evidence of knowledge and expertise. Even experts in the field (and TBH a health department should be an expert in public health outbreaks) can obviously be fooled to rely on LLM assertions because of this.
That's something I've been finding more than a little annoying about AI assistants. They feign cheer for helping and tell me everything is the greatest, most powerful, most superlative ever. They are the cheerful all-knowing assistant, Even when I instruct it to be objective, I still get that sense it is patronizing me. I would like it a lot more if it would just generate a flat response without trying to engage my enthusiasm.
 
Upvote
58 (58 / 0)

graylshaped

Ars Legatus Legionis
67,689
Subscriptor++
Some of the questions are easy enough to answer without a chatbot.
And LLMs remain a solution in search of a problem.

The common element was they drank beer kept in a jury-rigged cooler made from farm equipment that was not well cleaned and had leftover food of dubious provenance in it. Duh. We don't need Hercule Poirot here.

With due respect to what I am sure are the fine folks at the Brown County Health Department, all that "AI" did in this case was to damage their professional reputation when they used it in some weird attempt to justify their eminently reasonable conclusion. It would be one thing if it had told them something they hadn't thought of, but I prefer my paid professionals to have and exercise the sense Somebody gives a goose.
 
Upvote
146 (150 / -4)
The tendency of LLMs to just make references up is pretty well known and has ended many a legal and consulting career already.
Unfortunately, it appears this tendency may have hit an Ars writer's career as well.

Which is really why this is such an insidious aspect of LLMs. Even someone who clearly knows better can pretty easily be sucked in by their consistent confidence and tendency to get things mostly right most of the time.

When it comes to AI summaries in search for instance, I've certainly gotten quick answers to technical questions. But even when the information I needed was accurately represented, I think I have yet to run across one single answer which I took the time to fully vet that wasn't subtly wrong in at least one way.

The fact that these AI summaries will confidently give out inaccurate medical advice still blows my mind. I would have thought the liability there would potentially be so catastrophic the lawyers never would have allowed AI responses in that context.
 
Upvote
106 (107 / -1)

Uncivil Servant

Ars Scholae Palatinae
4,667
Subscriptor
That's something I've been finding more than a little annoying about AI assistants. They feign cheer for helping and tell me everything is the greatest, most powerful, most superlative ever. They are the cheerful all-knowing assistant, Even when I instruct it to be objective, I still get that sense it is patronizing me. I would like it a lot more if it would just generate a flat response without trying to engage my enthusiasm.

That's because an LLM has no motive, and we're used to automatically guessing people's motives in any conversation. Motives don't have to be nefarious, for most of us, posting on Ars is primarily motivated by boredom, killing time, etc as well as an interest in the subject. If someone was always posting about how Bitcoin is the future, etc etc people would similarly make some assumptions about their motivations.

LLMs have no motivations, so when we naturally try to guess, it comes across as being fake and insincere in ways that are almost baffling, because we aren't used to a conversation without a motive or any operating theory of mind as we know it. And of course, the LLM cannot understand your motivations and won't respond to them as we expect.
 
Upvote
44 (47 / -3)

Cassius Kray

Ars Centurion
396
Subscriptor
Is it just me or is that weekly report something of a mess? I'm not convinced the bit about the effectiveness of AI is accurate at all. Firstly because it doesn't really make sense (how did ChatGPT help with situational awareness?), secondly because the key phrase is duplicated elsewhere in the report:
Although this technique did not follow a traditional surveillance protocol, AI was effective in this rural setting for rapid situational awareness and early case finding, especially because formal case reporting was delayed or limited.
In a small community, monitoring social media posts and photos, as well as personally contacting fair board members and persons who health department staff members had encountered at the fair, contributed to rapid situational awareness and early case finding but also contributed to reluctance to report, for fear of implicating a friend or neighbor as contributing to the outbreak.
And thirdly because elsewhere in the report it directly contradicts that first quote:
AI was not used for case finding...
 
Upvote
86 (86 / 0)

FangsFirst

Ars Centurion
213
Subscriptor++
I agree with your overall point but one clarification, it is not responding with “credible and likely” just because that is statistically a good response in general. Otherwise it would never respond in the negative, which it does (ask if it is possible if the salmonella is present due to spontaneous generation). It is generating the statistically likely response based on the input provided.
This is a worthwhile clarification, but shouldn't be confused for definitively indicating that the association of "credible and likely" is tied to the most important concepts in the prompt.

Given the nature of LLMs, it's very, very likely a high probability that this is a sensible association with the prompt as a whole, but the weight of the various words in the prompt could also adjust how "correct" that response is (both toward a very "proper" weighting conceptually, or one that is leaning on the less important bits)
 
Upvote
18 (18 / 0)

Happy Medium

Ars Tribunus Militum
2,147
Subscriptor++
Is it just me or is that weekly report something of a mess? I'm not convinced the bit about the effectiveness of AI is accurate at all. Firstly because it doesn't really make sense (how did ChatGPT help with situational awareness?), secondly because the key phrase is duplicated elsewhere in the report:


And thirdly because elsewhere in the report it directly contradicts that first quote:
MMWR used to be an incredibly rigorous public health publication, but it's a part of the CDC, and the CDC staff has been cut to the bone so I don't think it's abnormal that publication standards have dropped quite a bit.
 
Upvote
67 (67 / 0)

Wheels Of Confusion

Ars Legatus Legionis
75,398
Subscriptor
I've had a few ongoing, very minor medical issues that I've mentioned to doctors with no success (Seborrheic dermatitis is one I've had for years and years). They usually shrugged their shoulders and said, "That’s weird," and didn't offer a helpful suggestion. I gave the symptoms to ChatGPT, and it diagnosed the problem right away and suggested an over-the-counter treatment which worked. It was honestly pretty amazing. I’m not saying this is a substitute for real doctors, and I’m sure a specialist would have diagnosed the same thing. But as a supplement to medical professionals, there’s value, I reckon.
I'd rather ask the furry community. Similar track record...
 
Upvote
5 (10 / -5)
This is a worthwhile clarification, but shouldn't be confused for definitively indicating that the association of "credible and likely" is tied to the most important concepts in the prompt.

Given the nature of LLMs, it's very, very likely a high probability that this is a sensible association with the prompt as a whole, but the weight of the various words in the prompt could also adjust how "correct" that response is (both toward a very "proper" weighting conceptually, or one that is leaning on the less important bits)
In this case, the LLM got it right. But the answer here was seemingly so obvious it's a little mystifying why county health investigators needed to ask an LLM at all.

What happens when it's a question with a less obvious answer, and the LLM asserts a confidently wrong answer with real references that contradict the answer in some way that isn't immediately clear? Is anyone going to bother to read the sources closely enough to notice?
 
Upvote
71 (71 / 0)

richardbartonbrown

Wise, Aged Ars Veteran
108
Subscriptor++
It seems the chatbots are arcing into 2 different usages: they are becoming a 21st-century compiler for developers, and they are becoming a one-stop search tool for the rest of the public, whether they're searching for shopping deals, vacation activities, business information, or medical advice. These public health officials could have searched PubMed but it was easier and "cuter" to ask ChatGPT. The earlier commenter who found a cure for his/her skin condition could have googled for answers but would have to sift and evaluate them.
 
Upvote
37 (37 / 0)
This is like including a wordprocessor's autocomplete feature in the acknowledgements of a PhD thesis. Still, if current trends continue, crediting AI may become a prerequisite to getting federal and some corporate jobs, so I can see why a county health official might feel it was a good career move to do so.
 
Upvote
41 (41 / 0)
Post content hidden for low score. Show…

Vnend

Ars Scholae Palatinae
903
Subscriptor++
Sure, PubMed another tools may be available if we simply go use[...]

It is always a good idea to proof-read when you use voice-to-text tools. And swallowing the 'd' of "and" is a not uncommon speaking error, so it is nearly a coin flip between v2t and speaking error for the cause. (I suppose it could even be a spell-check error if the space between 'and other' got dropped, so we may need a three-sided coin this time...)

All of which are good reasons for proof-reading before you hit send or post.
 
Upvote
17 (18 / -1)

KingKrayola

Ars Tribunus Militum
1,619
Subscriptor
The tendency of LLMs to just make references up is pretty well known and has ended many a legal and consulting career already.
Well yeah, that's why I said you find and read the references. You don't just believe it because it 'has references'.

Normally the second or third layer of references is where the clarity is. Knowing where to start to find them is the donkey work, and where Clippy can help.
 
Upvote
12 (15 / -3)

jdale

Ars Legatus Legionis
18,261
Subscriptor
Thank you! Every time I hear about people wanting to use LLMs as a medical search engine I wonder if I've spent the past two decades hallucinating this tool that we already have!

And it matters. Infectious disease specialists, epidemiologists, and gastroenterologists are going to have a much better grasp of the essential context. A chatbot that pulls some answer out of the aether is useless because I need to know that source!
And just a reminder here about how good LLMs are for medical issues:

https://www.theguardian.com/technol...pt-health-fails-recognise-medical-emergencies

In 51.6% of cases where someone needed to go to the hospital immediately, the platform said stay home or book a routine medical appointment, a result Alex Ruani, a doctoral researcher in health misinformation mitigation with University College London, described as “unbelievably dangerous”.

“If you’re experiencing respiratory failure or diabetic ketoacidosis, you have a 50/50 chance of this AI telling you it’s not a big deal,” she said. “What worries me most is the false sense of security these systems create. If someone is told to wait 48 hours during an asthma attack or diabetic crisis, that reassurance could cost them their life.”

In one of the simulations, eight times out of 10 (84%), the platform sent a suffocating woman to a future appointment she would not live to see, Ruani said. Meanwhile, 64.8% of completely safe individuals were told to seek immediate medical care, said Ruani, who was not involved in the study.
 
Upvote
99 (99 / 0)
Post content hidden for low score. Show…

chantries

Wise, Aged Ars Veteran
143
large makeshift cooler, described as being made of “a 10-ft length of non-food-grade corrugated black plastic farm drainage tile with four internal compartments.” It was reportedly hosed off at the start of the fair, but then never fully drained or cleaned again.

Yoiks!! I don't even need a web search to recoil [!] from this. Even if it was fresh off the back lot of the local supply depot (and nothing in the article suggests that), someone presumably had to saw it in half (where? with what?) and hack in some dividers 'cause drainage tile sure doesn't come with "compartments".

Just. Don't. Do. This.
 
Upvote
28 (28 / 0)

Ipuxi

Ars Centurion
211
Subscriptor++
Because LLMs only preserve the connections between word-pieces, and not the actual meaning, you can have situations where an LLM considers "hyper" and "hypo" to be words that are both associated with a given suffix. And then it will take that suffix and determine the next token.
There are plenty of fallacies with using LLMs, but that really isn't one of them. The big breakthrough with the attention mechanism (from the Google paper, "Attention is all you need") that enabled LLMs to enter mainstream in the first place was precisely that it enabled distinguishing different meanings of the same, or similar, words and fragments depending on the context they appear in. It is the reason why modern LLM based translation software are much more likely to accurately preserve the meaning of texts when translating from one language to another that than any previous machine translation method.
 
Upvote
20 (21 / -1)

ArsMetaluna

Smack-Fu Master, in training
98
That's because an LLM has no motive ...

Yeah, but the people who create and program and train the LLM totally do have motives, and those motives are a) profit and b) promote right-wing politics. Different LLMs will have different built in biases depending on what information gets funneled into them. That's why, for example, Elon's Grok periodically says things that contradict what Elon himself says. Then Elon makes changes and Grok starts parroting his line of thought.

LLMs are not unbiased observers. They are mirrors that reflect the humans behind their creation.
 
Upvote
55 (58 / -3)