Supposedly we all are living in a simulation, so maybe WE are the hallucinations.To take your point further, none of us can prove anything is real. Who's to say we are all not hallucinatingHappy Day
Well that certainly helps explain why Windows 11 has completely gone to shit.The findings are the latest to demonstrate the inherent untrustworthiness of LLM output. With Microsoft CTO Kevin Scott predicting that 95 percent of code will be AI-generated within five years, here’s hoping developers heed the message.
It's not possible. LLMs hallucinate by design. They are stochastic parrots spewing back tokens they have seen in the vicinity of similar tokens with probabilistic weights. When you train it on trillions of (stolen) documents, then it tends to spew back coherent sounding things because they're stolen from previous documents. But as soon as it starts mixing and matching the risk of bullshit skyrockets. Again, there is zero thinking going on, it's just 'I saw these tokens near these tokens in a couple documents before'. This is why the wrong package names here are not random, it tends to hallucinate similar wrong ones repeatedly.But the jury seems to still be out on whether it's even possible to design an LLM that never hallucinates.
It seems like this old XKCD is still applicable.It is an industry-wide problem. These LLMs all do it, the biggest ones are bad about it and the smaller ones are slightly worse.
when the creators of these LLMs and the “experts” are saying “well, we can’t really say why it does what it does, we don’t really understand it” that’s the big red warning sign that we shouldn’t be depending on them for anything.
I think for the package names you can probably have a white listed set of modules that exist at the training cut off date and just filter all generated import statements. Simple and stupid.The best you can do at this point is have the LLM 'watch' its own output and attempt to cross-check it, which does somewhat work, except the checker attention head is just as prone to bullshit as the original one, so you need at least three to 'vote' on it, which of course skyrockets the energy cost and still doesn't guarantee anything. I have, when playing around with OpenAI (know the enemy), told it it was wrong about something it was right on just to see what it would do, and it completely accepted that it was wrong and rewired everything to justify that. Claude does the same thing.
Yeah, that's a great point.Now, you know what LLMs are really good at? Writing malware! Because it's fine if it only works 50% of the time if you can test it, keep the ones that work, then get it out NOW. China has really ramped up on this.
To your point about solving problems, they actually do. Just not to the extent they're being hyped to.Oh, that's fascinating!
Best quote I heard on the subject was "Why are we using AI to create new problems instead of solving old problems?" and that, of course, is the heart of the matter. LLMs do not solve old problems.
I was wondering how the heck do you detect hallucinations, but I did not at all think of package names as an attack vector. How remarkably insidious! Of course, this has always been a problem with people dropping package names with typos and just waiting for someone to bite, but now your code copilot brings the exploit to you!
I wouldn't even know where I'd start with coding today, since you apparently need to understand supply chain first.
Can we stop being prescriptive about language because even if you had a good point it never actually works?Can we stop echoing marketing-speak like "hallucinations" or "misalignment" and just call it what it is - "garbage data"?
I usually say that Copilot gets me 90% of the way there, but that 90% of the way was absolutely the grunt work that took me most of the time in the past. That last 10% was the part that was novel to the project anyway, so it basically let me concentrate on the actual problem instead of wasting time on that preparatory nonsense.LLMs make great helpers for searching obtuse documentation but they're all too happy to regurgitate someone else's Stack Overflow solution which won't be designed for your specific circumstances unless your cases are super generic.
Don't let them write your code, but don't be afraid to use them to find stuff for you.
When it comes to the Copilots and GPTs of the world, the old "trust but verify" saying is very applicable.To your point about solving problems, they actually do. Just not to the extent they're being hyped to.
They're a very useful tool for information discovery, because old-style search engines have be on their deathbed for years. SEO is making search engines useless, so it's good at LLMs came along when they did.
Except LLMs ignore hard-coded instructions at a rate >1% in my experience.I suspect code-support agents will eventually be modded with a fair bit of pre and post processing code which attempt to avoid this type of blunder. It will be whack-a-mole, but some classes of problem like this can be mostly solved especially with other tools like a curated list of allowed libs.
It personifies an algorithm to the point that people associate non-hallucinations as reasoning and thinking. Which is dangerous and is exactly why the term needs to be eliminated.Can we stop being prescriptive about language because even if you had a good point it never actually works?
I don't think the term "hallucination" is doing marketing any favors. Do you usually consider someone who is hallucinating to be a reliable source of information, or a person you would want to entrust with a task? I would argue that someone who is simply wrong/incompetent is actually a more trustworthy person that someone who is hallucinating.
The really insidious miscreants will make their slopsquatted package actually do what it says on the tin in addition to their intended mischief. So the code may even work and leave the developer none the wiser about the malware that hitched a ride with the mostly functional package.It doesn't, people hit run get an error and edit it out. Newer AI may itself take a several passes and edit it out in the end.
Except when malicious party runs the AI and sees it make a plausible package name. They can then upload a malicious package with that name. From that point onward, when people (or AIs) try to run the generated code that has the same made-up name, they install that package, which runs the installer script and potentially compromises their system, or worse yet their customer's system.
Throughout history, the one class that has always prospered is the one that mastered schlepping things efficiently.Oh, that's fascinating!
Best quote I heard on the subject was "Why are we using AI to create new problems instead of solving old problems?" and that, of course, is the heart of the matter. LLMs do not solve old problems.
I was wondering how the heck do you detect hallucinations, but I did not at all think of package names as an attack vector. How remarkably insidious! Of course, this has always been a problem with people dropping package names with typos and just waiting for someone to bite, but now your code copilot brings the exploit to you!
I wouldn't even know where I'd start with coding today, since you apparently need to understand supply chain first.
Really? That's not clear? The problem they're trying to solve is to increase operating margins by having code "written" faster by people they don't have to pay as much as genuine software engineers.... It's still unclear to me what problem we're trying to solve, here.
Haven't reached the uncanny valley stage yet.As an aside though, I get a kick out of people who like to very loudly let the world know that they consider LLMs and similar systems to "not be thinking". As if that makes them any less useful.
Though, from my POV, we're quickly entering the P-Zombie realm of AI. At which point, at least from my viewpoint, it's neither here not there.
May actually be quite easy to do, when the functionality is just one or two method calls.The really insidious miscreants will make their slopsquatted package actually do what it says on the tin in addition to their intended mischief. So the code may even work and leave the developer none the wiser about the malware that hitched a ride with the mostly functional package.
One of the things that makes package hallucinations potentially useful in supply-chain attacks is that 43 percent of package hallucinations were repeated over 10 queries. “In addition,” the researchers wrote, “58 percent of the time, a hallucinated package is repeated more than once in 10 iterations, which shows that the majority of hallucinations are not simply random errors, but a repeatable phenomenon that persists across multiple iterations. This is significant because a persistent hallucination is more valuable for malicious actors looking to exploit this vulnerability and makes the hallucination attack vector a more viable threat.”
But it personifies algorithms as something other than an infallible thinking machine that must be right, because it's not prone to human failures.It personifies an algorithm to the point that people associate non-hallucinations as reasoning and thinking. Which is dangerous and is exactly why the term needs to be eliminated.
It might not be quite as easy as you're suggesting, but I still think you're exactly right. Unless the package is claiming to do something practically impossible, a programmer capable of writing the malicious code is more than capable of writing the actual code to accomplish the intended goal.May actually be quite easy to do, when the functionality is just one or two method calls.
It's kind of like typosquatting on steroids - the package name doesn't need to sound like any existing package, and there's a set of bullshitted functionality to go with it.
No, if that was the case then they would call it what it truly is - statistical errors. There's a reason the LLM companies chose the term "hallucination" and it isn't because they want to point out the fact that computers shouldn't be trusted. These are the same companies that are calling their new models "reasoning models".But it personifies algorithms as something other than an infallible thinking machine that must be right, because it's not prone to human failures.
This isn't dangerous at all. It's basically making people think that computers shouldn't be trusted just because they're computers. I would find that kind of thinking to be far more dangerous.
This seems more of a you (and people like you) thing.No, if that was the case then they would call it what it truly is - statistical errors. There's a reason the LLM companies chose the term "hallucination" and it isn't because they want to point out the fact that computers shouldn't be trusted. These are the same companies that are calling their new models "reasoning models".
The fact that it personifies them at all is a huge cultural problem. This is why there's now "agentic" models coming out. This is why companies are even thinking that these agentic models can replace developers and other things (a company I work for is currently exploring this sad excuse for productivity). The LLM companies are influencing high level decisions by choice of words and unless you correct those terms the people that are completely ignorant of the tech will personify the algorithms to the point that they trust it the same way they trust people that just make mistakes sometimes.
Even on ars, whenever theresa PR fluff piecenews about OpenAI features and commentors bring up LLMs making mistakes the common argument for the LLMs is "well people make mistakes too!". It's a branding term and it's inappropriate and dangerous.
This seems more of a you (and people like you) thing.
There's a pretty common saying along the lines of "computers don't make mistakes". It's already ingrained in people's thinking. All those horses have not only left the barn but are completely out of sight.
I stand by saying that the computer is hallucinating is actually a great thing.
But in any case, it's a fools errand to try to police language. Apart from deciding that we shouldn't use derogatory language, it's never really worked. And even then it wasn't specific language police but the culture as a whole.
Like outsourcing coding perhaps.Really? That's not clear? The problem they're trying to solve is to increase operating margins by having code "written" faster by people they don't have to pay as much as genuine software engineers.
The difference is at what type of usage these things become unsafe and unreliable, and the extent to which they are unreliable.Any car can get in an accident that doesn’t mean if VW has a safety issue we can brand all cars as being unsafe .
Sure there’s also potential for hallucinations but it vary based on the model, models guardrails , grounded data, prompt and other mechanisms such as RAG.
Imagine if out of 576,000 car trips across a variety of manufacturers, they counted 440,000 instances of the car breaking down mid-trip and requiring fixing. Not because any one manufacturer was just complete fucking shit at their job, but because steering wheels just start spinning sometimes and nobody understands why.The study, which used 16 of the most widely used large language models to generate 576,000 code samples, found that 440,000 of the package dependencies they contained were “hallucinated,” meaning they were non-existent.
You think hallucinating is just "making errors"? Talk about a poor argument. Hallucinations are a thing that can get you committed. That's not just an "oopsie" level. That's a "this person probably shouldn't be trusted with any sort of responsibility" category.That's a pretty poor argument that because people have a tendency to believe computers are infallible that therefore additional language that not only encourages the belief that they are infallible, but are actually reasoning beings that can think and analyze and yes also "hallucinate" because they make errors just like humans.
Uncanny valley as I understand it is more a visual thing.Haven't reached the uncanny valley stage yet.
Uncanny valley has Yoda's syntax, would say one not?Uncanny valley as I understand it is more a visual thing.
You think hallucinating is just "making errors"? Talk about a poor argument. Hallucinations are a thing that can get you committed. That's not just an "oopsie" level. That's a "this person probably shouldn't be trusted with any sort of responsibility" category.
Our most fundamental level of judging whether a person is capable of looking after themselves and others is if they are living in the same shared reality. And not just whether they agree with our politics or whether or not we landed on the moon. Rather, whether it's okay to drive on the opposite side of the freeway because you got a great idea that it would be more efficient.
(I want to note here that I don't agree with society's attitudes towards mental illness, specifically those who experience hallucinations. It's a harmful stereotype that they are a danger to others, or that they cannot live a normal life. But since we're talking about language here, I have to recognize the reality of the way people use that language today.)
A bit of a strawman you are building there.So you have no problem with false advertising? Tesla calling their driving assistance as Full Self Driving isn't deceptive?
Your very premise on its origins is faulty.Which is why when we allow companies to dictate terms that give impressions of services that frankly don't exist.
Maybe because you were too busy telling people not use words you don't like rather than listening to just how derisively people were using it.I have never met anyone that actually associates LLM hallucinations to the same negative context that human hallucinations occur. In fact, they use it in the context of it making a human-level mistake.
You mean like the junior-level code backed with senior-sounding loudmouth keyword dropping like "test coverage" and "agile" ... code that you only casually audit and in minutes find their package.json including libraries that are not even what the developer thought they did?AI-generated computer code is rife with references to non-existent third-party libraries, creating a golden opportunity for supply-chain attacks that poison legitimate programs with malicious packages ...
We can reuse the old joke about regex here:Oh, that's fascinating!
Best quote I heard on the subject was "Why are we using AI to create new problems instead of solving old problems?" and that, of course, is the heart of the matter. LLMs do not solve old problems.
I was wondering how the heck do you detect hallucinations, but I did not at all think of package names as an attack vector. How remarkably insidious! Of course, this has always been a problem with people dropping package names with typos and just waiting for someone to bite, but now your code copilot brings the exploit to you!
I wouldn't even know where I'd start with coding today, since you apparently need to understand supply chain first.
I'd say lying is a small word, and what's more, it applies well, because intelligence (really be used as a substitute for self-awareness) is irrelevant. It is telling us things that are factually false as though they are true. If that's not lying, then the definition of lying is meaningless.Lying is a big word and projects too much intelligence on these models. It would be better applied to the people selling them.
Factually true but misleading. LLMs are intended to act like us, and so if they do things that would be hallucinations coming from us, or lies, then to all intents and purposes they are hallucinating or lying.To take your point further, there are no such things as hallucinations. All LLM output is a statically chosen best guess. Some of those guesses happen to be correct, some incorrect, but there is no difference beyond that.
Of JavaScript and Python, that's the first misstep... Ugh.... 576,000 code samples ...
How is this going to solve anything, though?I think all F/OSS projects should put an anti-cheat policy in place. No LLM generated or even LLM-assisted code contributions.
I've seen generative AI slop swamp communities and make it nearly impossible for the quality assurance and content safety groups to keep up. The only viable solution right now is simply to close the floodgates.