AI-generated code could be a disaster for the software supply chain. Here’s why.

VividVerism · Apr 30, 2025

chaos215bar2 said:
All you need is a centralized registry where contributors earn reputation points of some form, and that's a pretty well solved problem.

No, it isn't.

MagneticNorth · Apr 30, 2025

Tam-Lin said:
No, you don't. A dependency is a risk, it always has been. It's often not something people think about/want to think about, but you've always had to trust a bunch of entities when you included code in your project. Implicitly, you're declaring that the benefit you get from the dependency outweighs the risk. AI risks just make them more sexy, suddenly.

I am not sure what point you are arguing or if we even disagree.

Any dependency is a risk, agreed. Being able to use what for example RedHat includes instead of finding it on your own lessens that risk, agreed.

If you are arguing that you can always find the functionality you need from an externally vetted source or can write it yourself, then I don’t agree at all. That depends on which domain you are working in and which platforms you have to support.

And when this happens you lose the security from having a trusted source and take on that responsibility yourself, and unless you have training and processes in place to ensure you keep handling it as the dependency is upgraded, people move between projects, or leave, then you increase the risk a lot. My argument is that a lot of devs and companies don’t do this when they should, because hey it’s boring and costs time and money, and these are risks we can’t really get rid of.

graylshaped · Apr 30, 2025

silverboy said:
I'd say lying is a small word, and what's more, it applies well, because intelligence (really be used as a substitute for self-awareness) is irrelevant. It is telling us things that are factually false as though they are true. If that's not lying, then the definition of lying is meaningless.

Lying is the wrong word because it implies intent, which LLMs do not have.

An LLM is, simply, wrong too often. It is unreliable. It is prone to errors of unpredictable frequency and severity. It lacks the sense [deity] gave a goose. It is a loose cannon. It clogs the development pipeline for promising entry-level candidates. At no point should the output of an LLM be handled any differently than the work of an intern who you suspect was out too late trying to impress colleagues the night before, and whose work product seems somewhat...off when compared to the recommendations accorded that individual.

The developers who sell these beta-caliber products without these caveats in bold letters, rather than in fine print, are the lying liars who lie.

matthewslyman · Apr 30, 2025

<s>Quiet down Dan: Sam Altman & Co. are busy amplifying the hype! You don't want anyone questioning the legitimacy of their claims of being on the very verge of "AGI", or challenging the applicability of our shiny new Artificial Intelligence overlords/tools (one or the other, never both at the same time!) to professional, commercial programming problems. Altman totally needs 7 trillion dollars to help us beat China/ DeepSeek & Alibaba & Baidu in the race to the end of the rainbow, the next horizon of AI/ML consciousness!!! Tech workers' salary demands were getting far too expensive especially in San Francisco, and after California rumbled the big tech firms' secret non-compete agreements we totally need this new vaporware, rumor mill, and mass layoffs to bring those overgrown office workers down to Terra Firma. Don't pay any attention to our previous predictions that 90% of all workers were bound to be made redundant by Artificial Intelligence already 18 months ago (or our repeated, loud, spooky warnings that the little people don't know what's coming so that everyone should listen ever more carefully to us, and incline their ears to our reading of the tea leaves), we pinky swear we're still on the very verge of ~~FSD~~ AGI, with just a little bit further to go…</s>

Seriously, what's the difference between using religion as a way to control the masses in the 1600s, vs. the Tech Bros/ false prophets of technology frightening everyone with tales of impending doom (vs. utopia if only we follow their prescriptions) to seize control of the lion's share of available capital?

Regarding corporate CTOs who are confidently predicting that 95% of code will be written by LLMs within months (and that AI "agents" won't merely be assisting real software engineers to work faster when creating lines/ blocks/ files/ modules of boilerplate or interface code); all we need to ask is: what would happen to those CTOs if they disagreed with those hyped-up predictions? What would happen to the stock of their corporate employer? (Would such a CTO be replaced with a true believer who could do a Musk, and keep promising FSD for a decade or more?) If they can't disagree with the Zeitgeist without being fired; then is this a case of The Emperor's New Clothes?

Granted, these tools are amazing and we can do great things with them; but can't the vendors just tell us the truth for once instead of trying to sell their faulty, dangerous products into environments where they don't fit? Why should we trust them or their word, if they almost certainly won't face real personal consequences (death, injury, imprisonment, or poverty) if 10 years from now, we find that they were lying, exaggerating, or ignorantly speculating about the capabilities of their products or the prospects of their economic impact? (The law is strict: proof of malice is usually required to impose real penalties. Therefore we should listen skeptically to their claims!)

When is a real expert going to do some proper analysis instead of making off-the-cuff hand-wavy guesstimates that 95% of coders will be replaced while 95% of coding gets done by machines while 20× more code gets generated and infinity × more problems get solved as we approach an AI singularity, so that we're all getting fired from our jobs by machines that are better than us while simultaneously living in an abundant egalitarian Utopia — or instead of making different predictions depending on which audience they're talking to (citing cost savings & profits to stock investors, while citing productivity gains to their product teams)?
There's a hard core of code in OS kernels, low-level filesystems, device drivers, commonly used software libraries, core product architecture & functionality, etc. For security and reliability reasons, we're never going to coded that with an LLM and call it "done" if it passes a few unit tests. We might automate more testing with the latest technology, reducing workload by 10%; but we still need real experts doing the hard yards!

Walshicus · Apr 30, 2025

Ostracus said:
People are kind of a black box, and we still depend on them (can't imagine why. )

Maybe, but people are also accountable and have access to "reality". Our individual experiences are by and large verifiable because we've lived them.

foobarian · Apr 30, 2025

Jarrex said:
Except LLMs ignore hard-coded instructions at a rate >1% in my experience.

Exactly, which is why some deterministic old-fashion code scanners would run after the LLM.

WesGordon · Apr 30, 2025

hillspuck said:
Can we stop being prescriptive about language because even if you had a good point it never actually works?

I don't think the term "hallucination" is doing marketing any favors. Do you usually consider someone who is hallucinating to be a reliable source of information, or a person you would want to entrust with a task? I would argue that someone who is simply wrong/incompetent is actually a more trustworthy person that someone who is hallucinating.

Marketing uses language manipulatively to influence readers, e.g. using anthropomorphic language to hype up their product. A hallucination is a mind malfunction, caused by illness, drugs, etc., but AI - like any software - is functioning normally when it emits garbage data. But marketing spin doesn't deal in plain truths like, "the normal functioning of this machine includes garbage data output". That simple truth would likely (and rightly) trigger a lot more scepticism about its practical applications.

VividVerism · Apr 30, 2025

WesGordon said:
Marketing uses language manipulatively to influence readers, e.g. using anthropomorphic language to hype up their product. A hallucination is a mind malfunction, caused by illness, drugs, etc., but AI - like any software - is functioning normally when it emits garbage data. But marketing spin doesn't deal in plain truths like, "the normal functioning of this machine includes garbage data output". That simple truth would likely (and rightly) trigger a lot more scepticism about its practical applications.

I'm mostly OK with terms like "hallucination" and "confabulation". What annoys me is headlines talking about a bot's "personality" or people using he/she pronouns to refer to it.

San Diego Dude · Apr 30, 2025

I code with AI daily. The secret of AI coding is to know WTF you're doing going in so you can understand what the AI is suggesting, also to do targeted updates. Feeding a 3k line script into an AI and telling it to 'improve this for me', even on the latest foundation super coders, is a recipe for disaster. Vibe coding is fun for weekend projects, but doesn't belong anywhere near production code, at least not without somebody who knows how it works.

Also, CVE scanner exist for a reason. I'm no more convinced a human coder will be as up to date with zero day exploits as an LLM coding assistant. Don't rely on your own programming knowledge OR an AI to avoid exploits. Use a scanner FFS.

VividVerism · Apr 30, 2025

San Diego Dude said:
Also, CVE scanner exist for a reason. I'm no more convinced a human coder will be as up to date with zero day exploits as an LLM coding assistant. Don't rely on your own programming knowledge OR an AI to avoid exploits. Use a scanner FFS.

By definition, there are no scanners in existence that will check for zero days. Once they are discovered and added to a scanner they are no longer a zero-day. And the type of issue that an LLM could introduce into your code will certainly never be found by a CVE scanner. A CVE scanner looks for known vulnerabilities in existing code. The LLM will introduce new unknown vulnerabilities in new code.

You can certainly reduce the risk by using static analysis tools. You can reduce it further by developing a detailed test suite with good feature coverage or even code coverage. Even better would be setting up fuzzing tests. All of that might actually find new vulnerability introduced by the LLM. But with the exception of static analysis, they're going to take significant effort to set up (assuming you want good results). And they're still no substitute for secure by design development in the first place, which all evidence so far shows the LLMs don't follow. Automated scanners will miss things, even things they're theoretically able to find, let alone things completely outside their scope.

sigmasirrus · Apr 30, 2025

Renx said:
Oh, that's fascinating!

Best quote I heard on the subject was "Why are we using AI to create new problems instead of solving old problems?" and that, of course, is the heart of the matter. LLMs do not solve old problems.

I was wondering how the heck do you detect hallucinations, but I did not at all think of package names as an attack vector. How remarkably insidious! Of course, this has always been a problem with people dropping package names with typos and just waiting for someone to bite, but now your code copilot brings the exploit to you!

I wouldn't even know where I'd start with coding today, since you apparently need to understand supply chain first.

Only use well known, well established dependencies. Use as few as possible.

Kenjitsuka · Apr 30, 2025

The findings are the latest to demonstrate the inherent untrustworthiness of LLM output.

Amen!!!!!!

A programmer friend of mine said "Most of the things they produce for me wont even run", when I showed him this article.

aggri1 · Apr 30, 2025

I use LLMs a lot for my coding, but I also almost always start with a clear scope such as "I have a Xarray DataArray with dimensions ..." and limit 'please write me code to ...' requests to specific clearly-defined tasks. In these cases, LLMs will (in my experience) use only the standard libraries I'd use anyway (xarray, pandas, numpy, etc...) and are a great help. Saves me heaps of time in writing boring standard functions which are easy to verify.

And when I get more adventurous I soon realise that I'm wasting time trying to get the LLM to solve the problem.

dkazaz · May 1, 2025

Why is this still a surprise? LLMs do not really reason and do not really write code. They apply an algorthic approach to produce “code like“ that is not in fact code but is close enough to be compiled and run.

But of course they will hallucinate and make errors. And that means code that is unreliable and vulnerable. And probably hard to maintain over time. Also considering again that LLMs do not reason, asking them to review or audit code is probably a bad idea too.

Ozy · May 1, 2025

dkazaz said:
Why is this still a surprise? LLMs do not really reason and do not really write code. They apply an algorthic approach to produce “code like“ that is not in fact code but is close enough to be compiled and run.

But of course they will hallucinate and make errors. And that means code that is unreliable and vulnerable. And probably hard to maintain over time. Also considering again that LLMs do not reason, asking them to review or audit code is probably a bad idea too.

How much have you worked with the new models to write, review, or audit code?

I mean, have you tried it?

Fritzr · May 4, 2025

WhatDoesTheFoxSay said:
Okay, I'm actually glad you brought this up because it's an excuse to do a ~~LINGUISTIC SIDEBAR~~

I completely agree with your take here. The output that's extruded by generative AI models is more accurately called "bullshit", which are things that are expressed with no regard for whether or not they're true.

BUT, dropping the word "bullshit" into a conversation about AI is much more likely to turn people against you. It's seen in common conversation as exaggeration and it's hard to explain and justify its use every time.

But if you can think of words other than "bullshit", that we can use to substitute for "lying", I am honestly open ears. One reason I love engaging on Ars, is I can respond to an article and then witness how it's received. Sometimes, I do okay. Sometime, it blows up in my face and I think about how I can do better next time.

Hallucination works and means to ordinary people exactly what it means to AI users.

Imaginary information dreamed up without foundation and presented as truth.

AI is a code generator on an LSD trip.
Computerised schizophrenia.

AI-generated code could be a disaster for the software supply chain. Here’s why.

Ars Tribunus Angusticlavius

Wise, Aged Ars Veteran

Ars Legatus Legionis

Ars Tribunus Militum

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Praetorian

Ars Tribunus Angusticlavius

Ars Centurion

Ars Tribunus Angusticlavius

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Centurion

Ars Praetorian

Ars Tribunus Angusticlavius

Ars Legatus Legionis