Your doctor’s AI notetaker may be making things up, Ontario audit finds

Arstotzka

Ars Scholae Palatinae
1,244
Subscriptor++
But that seemingly key “accuracy” metric was only responsible for about 4 percent of a vendor’s overall score, making it easy to meet the minimum threshold for approval even if an AI scribe scored a “zero” on the accuracy metric (a separate metric measuring “domestic presence in Ontario” was worth 30 percent of the overall scoring).
Accuracy: 4%
Domestic Presence in Ontario: 30%

It is refreshing to see priorities spelled out so honestly. Here's the table from the linked PDF, if anyone else is curious. Domestic presence was the highest-weighted criteria, beating out trivialities such as accuracy, security, formatting, usability, and privacy.
Screenshot 2026-05-14 at 10.38.00 AM.png
 
Upvote
141 (141 / 0)
As someone in Ontario that regularly needs medical help, medications that aren't common, and whose conditions change whenever they feel like it, I have to see my doctor often. And recently they posted notices on the walls that state that "MOH (Ministry of Health) approved AI" is being used.

And its absolutely wrong.

They used Telus health to sync health records, which had issues but worked. Moving away from paper. And now my doc and everyone who works there are forced to use shit-tier AI. Which is blatantly wrong.

I'd argue its just over 70% wrong. Considering my immense 1100+ page medical record. Because some things like "stage fright" "performance anxiety" or "shy bladder" don't exist. Not in AI's world.

To save lives of Ontarians the entire board of CPSO and MOH needs to be fired ASAP. CPSO especially, they seek to punish doctors that help. Id give every person that makes up the CPSO board 2yrs 1mo in prison tbh. That way they'd be listed as a serious criminal offender. Because they denied healthcare for the lulz, and because they're completely incompetent.

Out in 2yrs, but with a criminal record. Slap on the wrist imho, considering how many are denied medical help. But a criminal record for the rest of their lives.
 
Upvote
76 (78 / -2)

spacekobra

Refiner of the Quarter
926
Subscriptor++
why can’t the doctors just do their job and write notes??
Just so its clear, this is an audit of a simulated situation to make sure that the tools being advertised are up to snuff for Ontario doctors.

As to why doctors can't just write their charts. Ontario has a doctor shortage, doctors are overwhelmed and AI tools are promising to make their life easier. So, what do you expect? Buy my tool and make your note taking brainless sounds like a great offer.
 
Upvote
68 (72 / -4)

roboninja73

Smack-Fu Master, in training
33
why can’t the doctors just do their job and write notes??

That's a lot of salary for a notetaker. There may be no valid way around it, but trying to free up some of their time for more technical tasks that require their actual expertise seems like a valid endeavour.
 
Upvote
55 (66 / -11)
why can’t the doctors just do their job and write notes??
Short answer...it is an opportunity cost in time that could be spent seeing/treating patients, which people (AKA prospective patients) complain about wait times to see doctors. Which, the trade off between hallucinating LLMs over longer patient wait times is...clearly problematic.
 
Upvote
52 (53 / -1)

Sarty

Ars Tribunus Angusticlavius
7,939
It isn't the number in this article that will attract the most attention, but they evaluated twenty vendors of this crap? LLMs weren't quite invented yesterday. How is the marketplace that differentiated, when there aren't really equivalents to production or shipping bottlenecks? Some tool ought to out-compete most of the field, shouldn't it?

I'm reminded of that saying in football--if you have two viable quarterbacks, you really have none. Same goes. If you have twenty approvable LLM medical scribe tools available, you really have none.
 
Upvote
10 (14 / -4)
So who is on the hook when these AI tools go wrong, in a field like healthcare, where consequences are life or death? Particularly when the hallucinating tools are actually recommended by government orgs?
The Doctor, ultimately. I work in radiology, and we have been using speech to text for years. It's up to them to proofread. If it is wrong, and there is a lawsuit, they will be hung out to dry.
 
Upvote
73 (73 / 0)

Tam-Lin

Ars Scholae Palatinae
845
Subscriptor++
why can’t the doctors just do their job and write notes??
They can and do; it’s why my wife, who is officially scheduled to work from 8 AM - 4 PM, routinely doesn’t get home until midnight. Because doctors get reimbursed for seeing patients, so their employers schedule them to see as many patients as possible, and don’t make any allowances for all of the ancillary work that has to be done around the patient encounters.
 
Upvote
100 (100 / 0)
It isn't the number in this article that will attract the most attention, but they evaluated twenty vendors of this crap? LLMs weren't quite invented yesterday. How is the marketplace that differentiated, when there aren't really equivalents to production or shipping bottlenecks? Some tool ought to out-compete most of the field, shouldn't it?

I'm reminded of that saying in football--if you have two viable quarterbacks, you really have none. Same goes. If you have twenty approvable LLM medical scribe tools available, you really have none.
We're still in the "race to marketshare" stage of the bubble where venture capital and speculation are propping up more options than will be viable. Consolidation and retrenchment will come eventually though as the free money slows down and profit fails to materialize, that or mergers as bigger players in health or whatever industry look to snap up these products that really belong as a feature rather than as a standalone offering
 
Upvote
22 (22 / 0)

einstein4pres

Seniorius Lurkius
19
Subscriptor++
why can’t the doctors just do their job and write notes??
In my experience, doctors are still responsible for the notes (whether self-written, AI-scribed, human-scribed, or dictated). The general idea is that these AI tools are sufficiently cheap and good that it frees up the doctor to spend less of their time writing notes and more of their time actually doctoring (either spending more time with each patient or seeing more patients).

Obviously, whether this is an actual value proposition will depend on the quality of the AI tool in question (for the given provider/provider's specialty).

I haven't done any analysis of the quality side of this, but I can tell you that such tools are quite popular with providers.
 
Upvote
15 (15 / 0)
The Doctor, ultimately. I work in radiology, and we have been using speech to text for years. It's up to them to proofread. If it is wrong, and there is a lawsuit, they will be hung out to dry.
I wasn't sure if a sanctioning org, like actual high-level government, okaying this thing would change that or not.
 
Upvote
4 (4 / 0)

Alethe

Ars Centurion
257
Subscriptor
The Doctor, ultimately. I work in radiology, and we have been using speech to text for years. It's up to them to proofread. If it is wrong, and there is a lawsuit, they will be hung out to dry.
As someone else said, moral and legal crumple zones for both their employers and the providers of these models. Despicable.
 
Upvote
10 (11 / -1)

Fatesrider

Ars Legatus Legionis
25,280
Subscriptor
why can’t the doctors just do their job and write notes??
Rhetorical question, but the issue is pretty straightforward.

They can't fucking write. Source: Me, after >20 years in the medical field earning a PhD in hieroglyphic interpretation. My schooling came from my father, who should have been a physician given how horrible his handwriting was.

So they have to learn to type. That's a WIP for most of them. They're taught a lot of skills in medical school, but it SEEMS that typing, legibility and coherence aren't among them. MOST dictate their notes, and expect a human to interpret them correctly. Mostly, they do. But given how they're going AI on that to get rid of the humans, I suspect that's where it's happening.

BTW, dictating notes has been a thing for 40 years.

The issue with AI is that all medical records are (supposedly) sealed, so it has no real clue HOW doctors write notes. So I'd expect it to take the medical shorthand that's often used and plays with it. Abbreviations will throw it (prn, QD, QID, p.o., IV, IM, etc.) and the use of medicalese (formal medical anatomy & physiologiy along with tests etc,) isn't common out in the "normal world".

Another aspect having nothing to do with that is doctors don't have the TIME to do that. Specialists, yes, they might see a lot fewer patients. But a GP will see 30+ patients/day, and THEN have to write notes on all of them, with some being a lot more comprehensive than others.

I can see WHY they'd want to use AI. But AI, as it's typically trained, will fuck that up very badly. So this result is not only not surprising, it was predictable - for anyone who has both a tech and medical background that is.
 
Upvote
25 (29 / -4)

JudgeMental

Ars Centurion
341
Subscriptor++
Accuracy: 4%
Domestic Presence in Ontario: 30%

It is refreshing to see priorities spelled out so honestly. Here's the table from the linked PDF, if anyone else is curious. Domestic presence was the highest-weighted criteria, beating out trivialities such as accuracy, security, formatting, usability, and privacy.
View attachment 135056
This is what I came to note. It's insane that "does it actually work" is the lowest metric. Then again, that matches my experience with just about any other legal or corporate entity, so I'm not actually surprised either.
 
Upvote
51 (51 / 0)

IncreaseMather

Smack-Fu Master, in training
69
Subscriptor
Because it was never that good either; the gold standard for transcription is a person, often disabled/otherwise home bound, who are usually amazingly fast and accurate, but, you know, cost money.
Physician here who for years relied on transcriptionists. They were phenomenal, excellent at their jobs and many helped catch mistakes/improved clarity of medical jargon. When my institution switched to dragon, I changed to typing my own notes out. It was faster and more accurate than dragon ever was (hates my southern drawl). And now my institution has rolled out similar AI to what this article is addressing. I plan to never use it.
 
Upvote
80 (80 / 0)

gmyx

Ars Centurion
231
Subscriptor
I wonder if they tested / emulated a bilingual conversation - I don't see that in the document. Not just English/French but the many other languages that exists in the province. I know when I talk to my doc I routinely switch from English and French, sometimes mid-sentence. My experience with Teams is that it just shits the bed and makes shit up, more than normal.
 
Upvote
19 (19 / 0)
Post content hidden for low score. Show…
[edit: deleted inaccurate stuff]
I notice errors in the CC [edit for clarity: closed captioning or subtitles] of almost every movie I watch, even in some cases flipping the meaning to the complete opposite. It is slightly concerning to think of that happening to the notes taken during my doctor visit. But as long as doctors are being held accountable for mistakes, I say let them use their professional judgment, just like they do for life-and-death medical decisions on a daily basis.
 
Last edited:
Upvote
9 (9 / 0)

Doug DigDag

Smack-Fu Master, in training
95
The benefit of note-taking is only occasionally derived from reading the notes.

The main benefit of note-taking is writing the notes. That is, the act of writing it down, of converting your perceptions into words, identifying the most important features, even of just moving your fingers on the pen, all serve to broaden your memory and understanding of the events on which the notes are being taken. All that is extremely valuable.

But as none of those intangible things are being directly tracked on any accountant's spreadsheet, zero value is assigned to them by the people writing the checks. What is valued is instead just this: quantity of text. And cost-effectiveness, real or illusory, is king.
 
Upvote
40 (42 / -2)
Why is AI involved at all rather than basic dictation software we already had?

Both my parents are doctors. You’d be surprised at how bad those are as well. And how bad humans are too.

A good study would compare different methodologies so we can determine what the optimal outcome ought to be.
 
Upvote
27 (27 / 0)

Anadromous

Ars Scholae Palatinae
602
Subscriptor++
I have used two different scribe platforms, with most of my experience being with Heidi. For me, it means I can look at the client while we are talking about things.
(Client not patient in my case, I'm a veterinarian).
Being able to look at the client and maintain that one-to-one connection is really helpful.

I can say the things that I am finding when I am doing an examination, and they are recorded in real time rather than forgotten when writing notes later.

And most important, there's a verbatim transcript stored with each consult, timestamped. So when the client claims that you said "x" or did not say "y", you have a nice verbatim record that you indeed did or didn't do the things that are being claimed.

I read every note for completeness and accuracy, and every discharge statement/summary for completeness. Still takes me far less time than it used to with traditional note taking. As with everything, it is important to use the tool, not let the tool use you.
 
Last edited:
Upvote
40 (43 / -3)

graylshaped

Ars Legatus Legionis
68,207
Subscriptor++
So who is on the hook when these AI tools go wrong, in a field like healthcare, where consequences are life or death? Particularly when the hallucinating tools are actually recommended by government orgs?
Prediction: Malpractice carriers will begin to exclude coverage for errors attributable to unproven models.
 
Upvote
14 (14 / 0)

CosmicCaribou

Smack-Fu Master, in training
52
Physician here who for years relied on transcriptionists. They were phenomenal, excellent at their jobs and many helped catch mistakes/improved clarity of medical jargon. When my institution switched to dragon, I changed to typing my own notes out. It was faster and more accurate than dragon ever was (hates my southern drawl). And now my institution has rolled out similar AI to what this article is addressing. I plan to never use it.
Props for this!
 
Upvote
20 (20 / 0)

Tam-Lin

Ars Scholae Palatinae
845
Subscriptor++
This is what I came to note. It's insane that "does it actually work" is the lowest metric. Then again, that matches my experience with just about any other legal or corporate entity, so I'm not actually surprised either.
These days, I’m not sure I’d agree. Digital sovereignty is a serious issue. Let’s say you do have an amazingly accurate solution, but it’s supplied by a company in a different country, maybe even a competitor. Or maybe a country you thought was friendly, but then the e populace elects a completely unfit person to lead the government. How confident are you that you’ll be able to rely on that solution?
 
Upvote
-1 (9 / -10)