AI-generated code could be a disaster for the software supply chain. Here’s why.

Strano · Apr 29, 2025

But the vibes, man, the vibes!

Rodinga · Apr 29, 2025

LLMs make great helpers for searching obtuse documentation but they're all too happy to regurgitate someone else's Stack Overflow solution which won't be designed for your specific circumstances unless your cases are super generic.

Don't let them write your code, but don't be afraid to use them to find stuff for you.

jacs · Apr 29, 2025

here’s hoping developers heed the message.

There are a lot of assumptions built into that last sentence.

Renx · Apr 29, 2025

Oh, that's fascinating!

Best quote I heard on the subject was "Why are we using AI to create new problems instead of solving old problems?" and that, of course, is the heart of the matter. LLMs do not solve old problems.

I was wondering how the heck do you detect hallucinations, but I did not at all think of package names as an attack vector. How remarkably insidious! Of course, this has always been a problem with people dropping package names with typos and just waiting for someone to bite, but now your code copilot brings the exploit to you!

I wouldn't even know where I'd start with coding today, since you apparently need to understand supply chain first.

J.King · Apr 29, 2025

Could be? Yikes. This is not at all surprising, but I admit the potential havoc is something I hadn't previously considered.

Little-Zen · Apr 29, 2025

kinpin said:
its high time we start calling out specific AI models instead of just putting everything together as “ai” (unless there’s evidence it’s an industry wide problem) . You wouldn’t blame Apple for Google’s privacy issues .

It is an industry-wide problem. These LLMs all do it, the biggest ones are bad about it and the smaller ones are slightly worse.

when the creators of these LLMs and the “experts” are saying “well, we can’t really say why it does what it does, we don’t really understand it” that’s the big red warning sign that we shouldn’t be depending on them for anything.

85mm · Apr 29, 2025

Renx said:
Oh, that's fascinating!

Best quote I heard on the subject was "Why are we using AI to create new problems instead of solving old problems?" and that, of course, is the heart of the matter. LLMs do not solve old problems.

I was wondering how the heck do you detect hallucinations, but I did not at all think of package names as an attack vector. How remarkably insidious! Of course, this has always been a problem with people dropping package names with typos and just waiting for someone to bite, but now your code copilot brings the exploit to you!

I wouldn't even know where I'd start with coding today, since you apparently need to understand supply chain first.

I don't understand why the software world allows anyone to publish packages into the package namespace of major tools without oversight? Even the worst software package mangers like Google's play store do at least do some checking. Are there not commercial services offering vetted package lists as a starting point?

85mm · Apr 29, 2025

Little-Zen said:
It is an industry-wide problem. These LLMs all do it, the best ones are bad about it and the worst ones are slightly worse.

when the creators of these LLMs and the “experts” are saying “well, we can’t really say why it does what it does, we don’t really understand it” that’s the big red warning sign that we shouldn’t be depending on them for anything.

Most people know they can't be depended on, but some people think they can get away with not checking, after all, no one holds software companies to account for all the other bugs and security holes.

Legatum_of_Kain · Apr 29, 2025

That last sentence in the article wins the vague award of the year.

That being said, anyone using LLMs to develop anything is not doing any engineering, that's just all throw away slop.

Most serious software engineering has hard constraints and serious unit tests that come from understanding what needs to be tested and its limits.

LLMs will never get to cover anything important besides homework slop at best, making teachers wish they would've never gone into teaching.

Want to do yourself a favor? Don't use them, get the knowledge and proficiency, be good at engineering and documentation and don't heed Microsoft's marketing slop.

fixate · Apr 29, 2025

kinpin said:
its high time we start calling out specific AI models instead of just putting everything together as “ai” (unless there’s evidence it’s an industry wide problem) . You wouldn’t blame Apple for Google’s privacy issues .

But aren’t hallucinations an issue with all AI LLMs isn’t it? Some maybe be worst than others, but it’s an intrinsic issue with all LLMs, not just a few specific ones.

85mm · Apr 29, 2025

fixate said:
But aren’t hallucinations an issue with all AI LLMs isn’t it? Some maybe be worst than others, but it’s an intrinsic issue with all LLMs, not just a few specific ones.

To take your point further, there are no such things as hallucinations. All LLM output is a statically chosen best guess. Some of those guesses happen to be correct, some incorrect, but there is no difference beyond that.

WhatDoesTheFoxSay · Apr 29, 2025

kinpin said:
One can argue its solving time and efficiency problem . Developers spend a ton of time on documentation , if AI can do half of that work, that’s even more time they can spend on coding . I guess the problem as always is finding where the AI comes in the assist the human in the process than outright replace human.

This is an important insight. OpenAI and other companies claim humans can save time by having the lying machines do the work for them, and in the (supposedly rare) cases when the lying machines do in fact lie, the human will notice it right away and fix it. But we know that this is an utter fantasy. It is in our nature to fall asleep at the proverbial wheel in such cases. (This is just one of a 100 reasons why you shouldn't use AI.)

Wheels Of Confusion · Apr 29, 2025

I think all F/OSS projects should put an anti-cheat policy in place. No LLM generated or even LLM-assisted code contributions.
I've seen generative AI slop swamp communities and make it nearly impossible for the quality assurance and content safety groups to keep up. The only viable solution right now is simply to close the floodgates.

MagneticNorth · Apr 29, 2025

85mm said:
I don't understand why the software world allows anyone to publish packages into the package namespace of major tools without oversight? Even the worst software package mangers like Google's play store do at least do some checking. Are there not commercial services offering vetted package lists as a starting point?

I think the main reason is that there isn’t one true source where vetted dependencies can be downloaded from, there are hundreds if not thousand of places to download them from. Some can be downloaded in archives from a web page somewhere, some from a project on GitHub, some old dependencies are still available on Sourceforge, etc. Even being aware of supply chain attacks it is hard to be sure you are getting what you think you are, especially if it is a component you haven’t used before so you don’t know what the official distribution channel is or any of the persons involved.

I am not aware of any commercial service offering vetted packages, and even if there are I doubt they would be able to keep up with the open source community. Remember, it isn’t one package that needs to be vetted, it is every available version of it.

85mm · Apr 29, 2025

WhatDoesTheFoxSay said:
This is an important insight. OpenAI and other companies claim humans can save time by having the lying machines do the work for them, and in the (supposedly rare) cases when the lying machines do in fact lie, the human will notice it right away and fix it. But we know that this is an utter fantasy. It is in our nature to fall asleep at the proverbial wheel in such cases. (This is just one of a 100 reasons why you shouldn't use AI.)

Lying is a big word and projects too much intelligence on these models. It would be better applied to the people selling them.

Wheels Of Confusion · Apr 29, 2025

kinpin said:
Any car can get in an accident that doesn’t mean if VW has a safety issue we can brand all cars as being unsafe .

There is no safe LLM right now. The "brand" of the generator doesn't matter in that regard.

MagicVolcano · Apr 29, 2025

People will spend more time refereeing this crap than they ever do writing it.

Tam-Lin · Apr 29, 2025

Pure code generation has never been a problem that needed solving. It's easy to write lots of code. It's hard to write secure, stable code that solves the problem you're trying to solve, and LLMs don't help with that at all. It's still unclear to me what problem we're trying to solve, here.

asharkinasuit · Apr 29, 2025

kinpin said:
Any car can get in an accident that doesn’t mean if VW has a safety issue we can brand all cars as being unsafe .

Sure there’s also potential for hallucinations but it vary based on the model, models guardrails , grounded data, prompt and other mechanisms such as RAG.

I'd bet my bottom dollar the amount of testing and legislation in place to make cars safe vastly exceeds the amount used to ensure LLM safety and efficacy. Once the two approach each other, then maybe you can use that analogy.

WesGordon · Apr 29, 2025

Can we stop echoing marketing-speak like "hallucinations" or "misalignment" and just call it what it is - "garbage data"?

85mm · Apr 29, 2025

MagneticNorth said:
I think the main reason is that there isn’t one true source where vetted dependencies can be downloaded from, there are hundreds if not thousand of places to download them from. Some can be downloaded in archives from a web page somewhere, some from a project on GitHub, some old dependencies are still available on Sourceforge, etc. Even being aware of supply chain attacks it is hard to be sure you are getting what you think you are, especially if it is a component you haven’t used before so you don’t know what the official distribution channel is or any of the persons involved.

I am not aware of any commercial service offering vetted packages, and even if there are I doubt they would be able to keep up with the open source community. Remember, it isn’t one package that needs to be vetted, it is every available version of it.

There are existing models that would work. Packaged software for Linux. Some distributions have a small archive. Some a huge one. Many have multiple levels of packaging from lightly vetted to more strongly checked. You can subscribe to other sources too if you're happy to do so. It's far form fool proof, but it's far better than we are seeing here.

Lexomatic · Apr 29, 2025

So, the vulnerability is that the genAI code-spewing tools create references to non-existent libraries, and do so repeatedly, so attackers can profitably hide malicious code at those locations. It won't snag everyone, but it'll snag enough. This is the same principle as cybersquatting (as explained by Kaspersky: for example, registering "walrmart.com" to capture the subset of people who mis-type "walmart.com") or a homographic phishing attack (as explained by Bitdefender: they ask you to click a link, you examine it for safety, but a subset of victims won't notice the sneaky similar characters -- "goog1e" for "google" or, more perniciously, Unicode glyphs).

VividVerism · Apr 29, 2025

Every other article I have read so far about this research has used the smirk-inducing term "slopsquatting" to refer to the attack method of pushing malicious code to those hallucinated project names in the default package repository. It hearkens back to "typo squatting" (creating malicious packages with names similar to real packages with small easily-mistyped differences) and combines it with "AI slop" to refer to the general mess of hallucinated, vapid, and otherwise useless AI generated content flooding everything everywhere all the time.

rightclick · Apr 29, 2025

85mm said:
I don't understand why the software world allows anyone to publish packages into the package namespace of major tools without oversight? Even the worst software package mangers like Google's play store do at least do some checking. Are there not commercial services offering vetted package lists as a starting point?

Seem like a great opportunity for an AI?

/s

Chinsukolo · Apr 29, 2025

AI-generated code could be a disaster for the software supply chain. Here’s why.

You don't have to convince me Dan, I can imagine plenty of reasons already. Still nice to see it being discussed, really I'm just being cheeky about the headline.

adamsc · Apr 29, 2025

85mm said:
I don't understand why the software world allows anyone to publish packages into the package namespace of major tools without oversight? Even the worst software package mangers like Google's play store do at least do some checking. Are there not commercial services offering vetted package lists as a starting point?

This is one of two existential challenges for open source: for decades, the software commons has grown because anyone with an idea has been able to share it with the world. Now there are two massive threats, both of which AI makes much, much worse: the first is that we’ve been seeing increasingly sophisticated Trojan packages trying to hide cryptocurrency miners, backdoors, etc. where AI makes it easier to make them look like genuine open source projects and, as the article and other recent news covers, make suggest fake package names which an enterprising attacker can register. This seems likely to force things into more of a two-tier system with things like trusted project namespaces, which is safer but makes it harder for developers who didn’t go to the right colleges or work at the right places to get noticed.

(The other threat is that open source used to be seen as contributing a public good and a boon for your career prospects. Now, however, a lot of programmers are wondering whether they’re effectively training their replacements since the companies which have made trillions using open source have gone full robber baron and are in a hurry to improve profit margins by laying people off.)

There are companies which try to audit packages and produce whitelists with tooling to offer a good developer experience around that, but those run into the frictional cost of developers not wanting to wait to use whatever hit new thing they just heard about (which certainly has some upsides along with the downside as any maintenance programmer can attest), and that also only partially addresses the problem since there have been many examples of attackers trying to get code into trusted projects. The biggest disaster we almost had was a back door in the widespread libzma which was only caught by chance but it’s by far not the only example of this and AI is going to make it much easier for attackers to produce plausible looking contributions at scale, not to mention giving maintainers more pull-request spam to distract them from code review. It might help detect problems, too, but right now it seems like the attackers are way ahead on the trade off.

WhatDoesTheFoxSay · Apr 29, 2025

85mm said:
Lying is a big word and projects too much intelligence on these models. It would be better applied to the people selling them.

Okay, I'm actually glad you brought this up because it's an excuse to do a ~~LINGUISTIC SIDEBAR~~

I completely agree with your take here. The output that's extruded by generative AI models is more accurately called "bullshit", which are things that are expressed with no regard for whether or not they're true.

BUT, dropping the word "bullshit" into a conversation about AI is much more likely to turn people against you. It's seen in common conversation as exaggeration and it's hard to explain and justify its use every time.

But if you can think of words other than "bullshit", that we can use to substitute for "lying", I am honestly open ears. One reason I love engaging on Ars, is I can respond to an article and then witness how it's received. Sometimes, I do okay. Sometime, it blows up in my face and I think about how I can do better next time.

Tomcat From Mars · Apr 29, 2025

Reposting a comment I made late on an older article.

For programming I find using an LLM to be much like having a junior programmer. You don't give it a big project and let it run wild, you give it little tasks.

Like sometimes I need a bash script to do a thing. I don't use bash all the time so I don't always remember all the syntax and how to do string parsing and everything that you might need to do. Sure, I could spend time searching and reading and writing and troubleshooting and forget it all again by the next time I need it. Or I can fire it into ChatGPT and get something that either works correctly the first time, or requires just a little bit of massaging, either way it has saved my time with the added bonus of including all the error handling, logging, and commenting that I probably would have been in too much of a rush (or too lazy) to add to what is basically a little helper script (you've done it too so don't look at me like that).

It's been even more helpful on the home front. My wife is not a native English speaker. She is proficient, but sometimes her phrasing can be awkward, downright nonsensical if she's been attacking the thesaurus, or embarrassing if she makes an accidental euphemism, so she is really self-conscious about it. For years she was constantly asking me to proofread everything from journal papers to emails to fucking text messages, it would drive me nuts and it's not like I'm a professional languager. Now she can fire it into ChatGPT, ask it to do some minimal cleanup, and review it to correct any mistakes it might have made (which it does sometimes, but so would I if I didn't fully understand the context) all by herself. Even with the review it's faster than getting me to do it because she doesn't have to wait for it to be avilable, deal with it whinging and bellyaching about having to do it, and sit there while it spends 10 minutes humming and hawing over how to rephrase something, it just does it. It's a hyperbole to say ChatGPT saved our marriage, but it certainly saved what's left of my hairline.

I don't trust it to get facts right though. One of the first things I did when ChatGPT came out was to fire in the details about a fairly US centric game setting and tried to generate some ideas for Canada in the setting. Right off the bat it tried to place the headquarters for an organization at the intersection of the Saskatchewan, Manitoba, and Ontario boarder (that's like saying the intersection of the California, Arizona, and New Mexico boarder for you Yanks) and even after correcting it it went right back to saying they shared a common boarder. So I knew right away that it could easily be full of shit, but it's pretty good brainstorming and churning out descriptions for fictional locations, organizations, and NPCs.

So LLMs have their uses, but they also have their limitations and you have to be aware of what those are.

Geebs · Apr 29, 2025

Tam-Lin said:
Pure code generation has never been a problem that needed solving. It's easy to write lots of code. It's hard to write secure, stable code that solves the problem you're trying to solve, and LLMs don't help with that at all. It's still unclear to me what problem we're trying to solve, here.

The exact problem we’re trying to solve is that if too many people want to get paid a fair wage for an honest day’s work, it becomes marginally more difficult for billionaires to send their girlfriends into space.

Gracana · Apr 29, 2025

WhatDoesTheFoxSay said:
Okay, I'm actually glad you brought this up because it's an excuse to do a ~~LINGUISTIC SIDEBAR~~

I completely agree with your take here. The output that's extruded by generative AI models is more accurately called "bullshit", which are things that are expressed with no regard for whether or not they're true.

BUT, dropping the word "bullshit" into a conversation about AI is much more likely to turn people against you. It's seen in common conversation as exaggeration and it's hard to explain and justify its use every time.

But if you can think of words other than "bullshit", that we can use to substitute for "lying", I am honestly open ears. One reason I love engaging on Ars, is I can respond to an article and then witness how it's received. Sometimes, I do okay. Sometime, it blows up in my face and I think about how I can do better next time.

You guys are overthinking this. Saying "lying machine" whenever the opportunity arises is like saying "microshaft" on slashdot in 2007, or talking about "sleepy Joe" with your facebook friends. It's not going to change any minds, but it gets a giggle out of the people on your team.

JStevenson · Apr 29, 2025

Tomcat From Mars said:
Reposting a comment I made late on an older article.

For programming I find using an LLM to be much like having a junior programmer. You don't give it a big project and let it run wild, you give it little tasks.

Like sometimes I need a bash script to do a thing. I don't use bash all the time so I don't always remember all the syntax and how to do string parsing and everything that you might need to do. Sure, I could spend time searching and reading and writing and troubleshooting and forget it all again by the next time I need it. Or I can fire it into ChatGPT and get something that either works correctly the first time, or requires just a little bit of massaging, either way it has saved my time with the added bonus of including all the error handling, logging, and commenting that I probably would have been in too much of a rush (or too lazy) to add to what is basically a little helper script (you've done it too so don't look at me like that).

It's been even more helpful on the home front. My wife is not a native English speaker. She is proficient, but sometimes her phrasing can be awkward, downright nonsensical if she's been attacking the thesaurus, or embarrassing if she makes an accidental euphemism, so she is really self-conscious about it. For years she was constantly asking me to proofread everything from journal papers to emails to fucking text messages, it would drive me nuts and it's not like I'm a professional languager. Now she can fire it into ChatGPT, ask it to do some minimal cleanup, and review it to correct any mistakes it might have made (which it does sometimes, but so would I if I didn't fully understand the context) all by herself. Even with the review it's faster than getting me to do it because she doesn't have to wait for it to be avilable, deal with it whinging and bellyaching about having to do it, and sit there while it spends 10 minutes humming and hawing over how to rephrase something, it just does it. It's a hyperbole to say ChatGPT saved our marriage, but it certainly saved what's left of my hairline.

I don't trust it to get facts right though. One of the first things I did when ChatGPT came out was to fire in the details about a fairly US centric game setting and tried to generate some ideas for Canada in the setting. Right off the bat it tried to place the headquarters for an organization at the intersection of the Saskatchewan, Manitoba, and Ontario boarder (that's like saying the intersection of the California, Arizona, and New Mexico boarder for you Yanks) and even after correcting it it went right back to saying they shared a common boarder. So I knew right away that it could easily be full of shit, but it's pretty good brainstorming and churning out descriptions for fictional locations, organizations, and NPCs.

So LLMs have their uses, but they also have their limitations and you have to be aware of what those are.

This.

The more you work with LLMs, the more you figure out where they're helpful — and where they can lead you astray. I find the whole notion of "vibe programming" a bit crazy at this point. You might get something pretty that technically works, but it's likely to be full of bugs and vulnerabilities.

For something like an OData filter, a KQL statement, or some other esoteric thing I don't do often? Huge time-saver. And it’s gotten progressively better over time.

I’d never use it to develop a full app. But it's fantastic for getting me unstuck — or spotting where I missed a semicolon or parenthesis.

Mechjaz · Apr 29, 2025

So how many of those were <include foo>?

Edit: or more on topic for JavaScript import foo;

educated_foo · Apr 29, 2025

Strano said:
But the vibes, man, the vibes!

TFW you realize Kevin Roose wrote your todo-list app.

foobarian · Apr 29, 2025

I suspect code-support agents will eventually be modded with a fair bit of pre and post processing code which attempt to avoid this type of blunder. It will be whack-a-mole, but some classes of problem like this can be mostly solved especially with other tools like a curated list of allowed libs.

AI-generated code could be a disaster for the software supply chain. Here’s why.

Smack-Fu Master, in training

Smack-Fu Master, in training

Ars Praetorian

Ars Praetorian

Ars Praefectus

Ars Praefectus

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Praefectus

Wise, Aged Ars Veteran

Ars Scholae Palatinae

Ars Praetorian

Ars Legatus Legionis

Wise, Aged Ars Veteran

Ars Scholae Palatinae

Ars Legatus Legionis

Ars Centurion

Ars Scholae Palatinae

Ars Centurion

Ars Praetorian

Ars Scholae Palatinae

Ars Praetorian

Ars Tribunus Angusticlavius

Smack-Fu Master, in training

Ars Scholae Palatinae

Ars Praefectus

Ars Praetorian

Ars Centurion

Ars Praefectus

Ars Centurion

Smack-Fu Master, in training

Ars Praefectus

Ars Scholae Palatinae

Ars Scholae Palatinae