Two AI-based science assistants succeed with drug-retargeting tasks

nickf · 2026-05-19T15:13:43-0400

I'm mostly retired from research (and academia) now, but I'd certainly find an 'agentic' search of my weekly PubMed search terms and summary of relevant papers useful. Typically I get a couple of hundred literature hits per week, and I can only properly study a handful of those papers.

Likewise an ability to load my articles database (about 5000 papers at present) into an agent and say, here's what I'm interested in, what's new in the literature? What ideas and associations have I missed in my library? Perhaps this ability already exists.

(FWIW I was also involved in drug discovery and development back in the day, for neglected tropical diseases, where's the little funding available. And yes, it's a very hard road. As the article alludes to, almost all lead compounds fail in animal/human studies. Anything that can expedite this process has to be a good thing).

WaveMotionGum · 2026-05-19T15:25:48-0400

So a possibly better search engine?

DCStone · 2026-05-19T15:30:15-0400

WaveMotionGum said:
So a possibly better search engine?

From my reading of the article, a system connected to existing scientific search engines (there are quite a few long-standing reputable ones) that manages the mass of results those will give you if you don't very carefully refine your search terms.

As stated, this is the kind of thing where carefully generated and validated results actually improve, rather than impede, the work-flow. Definitely not the sort of thing that can be vibe-coded on a Friday afternoon though!

JudgeMental · 2026-05-19T15:34:22-0400

WaveMotionGum said:
So a possibly better search engine?

Almost a summary engine, is how I'm imagining it.

Honestly, this is where I could see this kind of AI functionality being appropriate. It's one thing when you're just regurgitating something from Stack Overflow while simultaneously strangling them the traffic they need to stay alive and current. It's another when there's genuinely too much information for an individual to process in a timely fashion. I don't know a ton about academic publishing so I suspect there are still gotchas to keep in mind in terms of sustainability and ethics, but I would hope they're more equipped to adapt than most websites.

Mrbonk · 2026-05-19T16:53:19-0400

I mean, scientists will be in the loop. Until it's decided it's more profitable without them.

Varste · 2026-05-19T20:14:43-0400

This is the kind of thing I've imagined AI tools being good for, provided the data and specific model is good. The zero percent to FORTY FIVE percent hallucination rate for a general, consumer-facing LLM is kind of insane. Good thing no one is using ChatGPT for medical advice.

Also, I am actually triggered by the phrase "imperfect factuality". What a bullshit PR term.

MilkyBarKid · 2026-05-19T20:28:04-0400

Varste said:
This is the kind of thing I've imagined AI tools being good for, provided the data and specific model is good. The zero percent to FORTY FIVE percent hallucination rate for a general, consumer-facing LLM is kind of insane. Good thing no one is using ChatGPT for medical advice.
Also, I am actually triggered by the phrase "imperfect factuality". What a bullshit PR term.

And that’s just one type of error. The paper doesn’t measure the rate of error in Robin’s final output, which would include not only fake citations but misinterpretation of the literature and the inevitable corruption (Microsoft research have a great paper on this) that comes out of running multiple LLM queries in series.

The only other tests of accuracy of this particular workflow use an LLM as the judge, which adds another source of error to the measurement.

YetAnotherAnonymousAppellation · 2026-05-19T22:30:03-0400

imperfect factuality

I love it! It's even better than "alternate facts."

asharkinasuit · 2026-05-20T00:16:20-0400

MilkyBarKid said:
And that’s just one type of error. The paper doesn’t measure the rate of error in Robin’s final output, which would include not only fake citations but misinterpretation of the literature and the inevitable corruption (Microsoft research have a great paper on this) that comes out of running multiple LLM queries in series.

The only other tests of accuracy of this particular workflow use an LLM as the judge, which adds another source of error to the measurement.

Do you have a link to that Microsoft study? Is it like a purple monkey diswasher-type situation?

MilkyBarKid · 2026-05-20T02:22:29-0400

asharkinasuit said:
Do you have a link to that Microsoft study? Is it like a purple monkey diswasher-type situation?

Yes and yes:
“LLMs Corrupt Your Documents When You Delegate”, https://arxiv.org/pdf/2604.15597

Artem S. Tashkinov · 2026-05-20T04:05:16-0400

Weird we have so few comments, no "AI SLOP" exclamations and the usual AI hatred is almost silent.

What happened, dear Ars audience? Your keyboard has broken?

Qwertilot · 2026-05-20T04:56:54-0400

Varste said:
This is the kind of thing I've imagined AI tools being good for, provided the data and specific model is good. The zero percent to FORTY FIVE percent hallucination rate for a general, consumer-facing LLM is kind of insane. Good thing no one is using ChatGPT for medical advice.
Also, I am actually triggered by the phrase "imperfect factuality". What a bullshit PR term.

Having worked on something looking at this a while back - the basic problem has been around for a while! - the data is rather mixed quality. Basically biology experiments are almost absurdly sensitive to the details of experimental method, equipment, cell line, phase of the moon etc used.

Yes, the papers do describe this somewhere. But they don't tend to do it remotely clearly or formally. Partially that's habit - they have brilliant ontologies for entity groundings, use is rather mixed in practice - partially it's just genuinely amazingly hard and very tedious to give enough detail.

As a result, iirc, even if you do careful, human driven, replication studies it's a long way from working at the level you'd like. The hallucination rate will mostly get lost in the inherent noise.

The ideal way to do automatic aggregation would probably involve starting small and building up an automatically reproducible base of knowledge. Or insisting that if people can't describe their experiments in enough detail for a given automatic set up to reproduce the results, it isn't published.
(Whether that would be a good idea I don't know - biology research still produces very good results over time despite all of this!).

fcdecker · 2026-05-20T08:23:13-0400

Artem S. Tashkinov said:
Weird we have so few comments, no "AI SLOP" exclamations and the usual AI hatred is almost silent.

What happened, dear Ars audience? Your keyboard has broken?

(shrug) AI (writ large) was already doing useful work before the generative AI hype machine/clusterfuck came along, and will continue to do so after the bubble pops. This is exactly the kind of thing AI could and should be doing, so what's to hate?

We just have to wait for the likes of OpenAI and Anthropic to implode, as they inevitably must (there's literally not enough money in the world to keep them running much longer), at which point the models they leave behind can be optimized and fine-tuned for a million-and-one practical niche uses. It just won't be the kind of all-conquering juggernaut that feeds a VC hype cycle.

Varste · 2026-05-20T10:40:09-0400

MilkyBarKid said:
And that’s just one type of error. The paper doesn’t measure the rate of error in Robin’s final output, which would include not only fake citations but misinterpretation of the literature and the inevitable corruption (Microsoft research have a great paper on this) that comes out of running multiple LLM queries in series.

The only other tests of accuracy of this particular workflow use an LLM as the judge, which adds another source of error to the measurement.

Very true, I didn't really pick up on that. A hallucinated reference is bad (as numerous AI-using lawyers keep finding out), but that says nothing of the content of the analysis itself, and if they hallucinated data or findings or connections.

1eardown · 2026-05-20T14:45:40-0400

Artem S. Tashkinov said:
Weird we have so few comments, no "AI SLOP" exclamations and the usual AI hatred is almost silent.

What happened, dear Ars audience? Your keyboard has broken?

I'd like to believe it's because Ars readers are discerning--not fanboys, not haters, not luddites, not yuppies. Definitely not categorically opposed to an entire family of technological solutions--i.e. LLMs.

"Seldom affirm, never deny, always distinguish." - St. Thomas Aquinas

Kenjitsuka · 2026-05-20T16:29:47-0400

Since running human trials is insanely expensive I have actual faith that the ideas/results will be checked REALLY well before going forward. So a rare AI win!

Two AI-based science assistants succeed with drug-retargeting tasks

Ars Tribunus Militum

Ars Praetorian

Ars Tribunus Militum

Ars Centurion

Ars Scholae Palatinae

Ars Praetorian

Ars Praetorian

Ars Praefectus

Ars Centurion

Ars Praetorian

Ars Scholae Palatinae

Wise, Aged Ars Veteran

Ars Centurion

Ars Praetorian

Seniorius Lurkius

Ars Scholae Palatinae