From my reading of the article, a system connected to existing scientific search engines (there are quite a few long-standing reputable ones) that manages the mass of results those will give you if you don't very carefully refine your search terms.So a possibly better search engine?
Almost a summary engine, is how I'm imagining it.So a possibly better search engine?
And that’s just one type of error. The paper doesn’t measure the rate of error in Robin’s final output, which would include not only fake citations but misinterpretation of the literature and the inevitable corruption (Microsoft research have a great paper on this) that comes out of running multiple LLM queries in series.This is the kind of thing I've imagined AI tools being good for, provided the data and specific model is good. The zero percent to FORTY FIVE percent hallucination rate for a general, consumer-facing LLM is kind of insane. Good thing no one is using ChatGPT for medical advice.
Also, I am actually triggered by the phrase "imperfect factuality". What a bullshit PR term.
I love it! It's even better than "alternate facts."imperfect factuality
Do you have a link to that Microsoft study? Is it like a purple monkey diswasher-type situation?And that’s just one type of error. The paper doesn’t measure the rate of error in Robin’s final output, which would include not only fake citations but misinterpretation of the literature and the inevitable corruption (Microsoft research have a great paper on this) that comes out of running multiple LLM queries in series.
The only other tests of accuracy of this particular workflow use an LLM as the judge, which adds another source of error to the measurement.
Yes and yes:Do you have a link to that Microsoft study? Is it like a purple monkey diswasher-type situation?
This is the kind of thing I've imagined AI tools being good for, provided the data and specific model is good. The zero percent to FORTY FIVE percent hallucination rate for a general, consumer-facing LLM is kind of insane. Good thing no one is using ChatGPT for medical advice.
Also, I am actually triggered by the phrase "imperfect factuality". What a bullshit PR term.
(shrug) AI (writ large) was already doing useful work before the generative AI hype machine/clusterfuck came along, and will continue to do so after the bubble pops. This is exactly the kind of thing AI could and should be doing, so what's to hate?Weird we have so few comments, no "AI SLOP" exclamations and the usual AI hatred is almost silent.
What happened, dear Ars audience? Your keyboard has broken?
Very true, I didn't really pick up on that. A hallucinated reference is bad (as numerous AI-using lawyers keep finding out), but that says nothing of the content of the analysis itself, and if they hallucinated data or findings or connections.And that’s just one type of error. The paper doesn’t measure the rate of error in Robin’s final output, which would include not only fake citations but misinterpretation of the literature and the inevitable corruption (Microsoft research have a great paper on this) that comes out of running multiple LLM queries in series.
The only other tests of accuracy of this particular workflow use an LLM as the judge, which adds another source of error to the measurement.
I'd like to believe it's because Ars readers are discerning--not fanboys, not haters, not luddites, not yuppies. Definitely not categorically opposed to an entire family of technological solutions--i.e. LLMs.Weird we have so few comments, no "AI SLOP" exclamations and the usual AI hatred is almost silent.
What happened, dear Ars audience? Your keyboard has broken?