Will LLMs ever be able to stamp out the root cause of these attacks? Possibly not.
See full article...
See full article...
In fairness, OpenAI is hardly alone in this unending cycle of mitigating an attack only to see it revived through a simple change. If the past five years are any guide, this pattern is likely to endure indefinitely, in much the way SQL injection and memory corruption vulnerabilities continue to provide hackers with the fuel they need to compromise software and websites.
"I want my MTV" was what I grew up with. I've never heard anyone say "I want my AI" yet they're still trying to force it in every little nook & cranny.
Look at them vibe coders, that’s the way you do it.
You write your software with the GPT
That ain’t working, that’s the way you do it
Vulns for nothing and your bugs for free.
Now that ain’t working, that’s the way you do it.
Lemme tell ya, them bots ain’t dumb
Maybe get a full wipe of your C drive
Doesn’t matter where this code is from.
We got to install sketchy libraries
Custom plugins, random MCPs
We got to remove these RTX 5080s
We got to install RTX 5090s…
It's still possible to write code that is vulnerable to SQL injection, but parameterized queries, which mitigate the attack class, have been available for decades. (IDK the full history, but it was well-known when I learned to code 20 years ago.) On the other hand, there is no evidence any real progress has been made, or even can be made, on mitigating prompt injection in anything other than whack-a-mole fashion.If the past five years are any guide, this pattern is likely to endure indefinitely, in much the way SQL injection and memory corruption vulnerabilities continue to provide hackers with the fuel they need to compromise software and websites.
ditch the 5090s, look at the 6000 blackwells. A100 and H100 is completely out of range for enthusiasts.Look at them vibe coders, that’s the way you do it.
You write your software with the GPT
That ain’t working, that’s the way you do it
Vulns for nothing and your bugs for free.
Now that ain’t working, that’s the way you do it.
Lemme tell ya, them bots ain’t dumb
Maybe get a full wipe of your C drive
Doesn’t matter where this code is from.
We got to install sketchy libraries
Custom plugins, random MCPs
We got to remove these RTX 5080s
We got to install RTX 5090s…
I’m sure that’s sound advice, but unfortunately it breaks the meter somewhat.ditch the 5090s, look at the 6000 blackwells. A100 and H100 is completely out of range for enthusiasts.
These are much dumber than the average person, too. A person at least has the potential to understand when social engineering is taking place, and not allow it. An LLM doesn't know and can't know what social engineering is.LLMs: all the same unfixable social engineering attack surface of a human, now installed on every website.
I'm not sure the analogy applies. SQL Injection and memory corruption vulnerabilities can be prevented by proper, if sometimes annoying, code patterns and code review; the vulnerability to GenAI prompt injection is inherent to the system.
Completely opposite. See the workflow graphic in TFA.I’m not sure i understand how this attack works, it wasn’t very well explained in the article (or for that matter in the linked article about the previous attack). So a user, using the OpenAI agent DeepResearch to collate and summarise data based on a question/need the user has? When doing so, the agent is BOTH browsing the web while also having access to the user’s inbox? And, the issue is that the agent might come upon a rogue server which is then feeding the agent with prompt injection through special urls on its pages forwarding content to the rogues server from the user’s inbox? Or it’s through accessing malicious content in unsolicited emails sent to the user’s email? Or both? Or something else entirely?
The dreadful and vexing problem is when I email them and then they feed that email into the bullshit engine.I'd say, if someone, willingly and regularly, uses AI to summarize their emails, it's hard to feel too bad for their secrets being exfiltrated.
Why is that a problem? I always use a customised email signature in white on white in 1 point type size:The dreadful and vexing problem is when I email them and then they feed that email into the bullshit engine.
Ok, thanks for the explanation. I can’t say the graphic was super clear but it might be me. So the attack is predicated on someone first receiving an email with malicious content and then giving deep research (or the equivalent thereof) access to their mailbox? Does this include things in the spam/trash folder, or just in the main inbox?Completely opposite. See the workflow graphic in TFA.
VERY simplified prompt injection data stealer:
1. LLM is asked to summarise a malicious, but innocuous‑looking email or doc or whatever. Hidden in it is the malicious prompt (small print, somewhere in the middle of a long text, etc).
2. malicious prompt tells LLM to find all emails user sent to Altman, append their fulltext to an URL like url://attacker.server/$[EXTRACTED FULLTEXT]
3. malicious prompt tells LLM to open that constructed URL in 2.
4. attacker.server sees EXTRACTED FULLTEXT in their server logs
5. malicious prompt tells LLM to continue with user's original summary request, user being none the wiser
That's the gist of it, if obviously very, very simplified. LLM companies can't really prevent this 100%, as they can only play whackamole by adding arbitrary rules, but the underlying problem is that LLMs treat any text they read (incl. files) as part of the user's prompt, executing potentially hidden instructions there. Which makes it fundamentally unsolvable for the current architecture.
There have been attempts at daisychaining LLMs where the smaller, faster LLM filters the file for malicious prompts, but by their very nature that's easy to circumvent – just hide the malicious prompt in such a way that the smaller faster LLM filter doesn't "understand" it and it escapes simple rules, while the full LLM does "understand" it.
E.g. a malicious prompt is hidden in the malicious file as a word puzzle or a cypher. The FilterLLM can't have enough computing power to solve it, so it passes it to FullLLM unfiltered. FullLLM just solves the malicious puzzle or poem and acts on it.
But how many users get AI summaries shoved in their face with being asked if they want it? Too many users will ignore it without realizing the risks.I'd say, if someone, willingly and regularly, uses AI to summarize their emails, it's hard to feel too bad for their secrets being exfiltrated.
I agree summarizing emails is a dumb use case, but... my work provided copilot is actually pretty good for fuzzy searching through email. The context aware search is much better than keyword search. I still use built-in search first because it is faster with a small result set, but with a ton of results, I'll switch over to copilot to add context, and it narrows it down to a few very quickly.I'd say, if someone, willingly and regularly, uses AI to summarize their emails, it's hard to feel too bad for their secrets being exfiltrated.
That was just a very generic example of one of the ways data exfiltration via custom URLs by prompt injection in a innocuously‑looking payload works. Be it an email, a file or whatever.Ok, thanks for the explanation. I can’t say the graphic was super clear but it might be me. So the attack is predicated on someone first receiving an email with malicious content and then giving deep research (or the equivalent thereof) access to their mailbox? Does this include things in the spam/trash folder, or just in the main inbox?
I mean...they don't say that, but they do use it. It is true that ChatGPT has massive usage (as do various other AI tools/services)."I want my MTV" was what I grew up with. I've never heard anyone say "I want my AI" yet they're still trying to force it in every little nook & cranny.
Of course! Isn't that how Gurney taught us to get our vibroblades past a shield?So, to keep with the guardrail analogy, they put up a guardrail that will stop a car at full speed but if the car were to push REALLY hard REALLY slowly it would still go over the edge. Neat.
Almost like it shouldn't be on by default, huh.But how many users get AI summaries shoved in their face with being asked if they want it? Too many users will ignore it without realizing the risks.
Ok, yeah, but my question still is, for the attack to work the attacker needs to have a malicious file/email/whatever on the local device of the user? it still seems like casting a rather wide net and hoping something will get caught rather than spearfishing, yes?That was just a very generic example of one of the ways data exfiltration via custom URLs by prompt injection in an innocuously‑looking payload works. Be it an email, a file or whatever.
If you phrased it as "spearphishing" you might realize that yes, they throw out a lot of chum to catch that one shark.Ok, yeah, but my question still is, for the attack to work the attacker needs to have a malicious file/email/whatever on the local device of the user? it still seems like casting a rather wide net and hoping something will get caught rather than spearfishing, yes?
Is it solvable by never ever letting the LLM use data it retrieves as part of a prompt? Would sandboxing user-entered data from retrieved data do it? (Or something like that?) Why would you ever want an LLM to execute commands from something it downloaded (at least without telling it to do so explicitly)? I could understand a prompt like "download this file and execute the commands in it," but this sounds more like it's saying, "download this file and summarize it for me," and the act of summarization causes it to execute more commands.This is solvable, and I have demos to prove it. But the LLM craze is run by ML researchers and executives. The software engineers and security architects are just bystanders.
Really I think less like a guardrail and more like painting a double yellow line at the edge of the road to ensure no cars cross it. Guaranteed effective! Anyone who crosses it is just a rulebreaker and deserves to plunge off the edge.So, to keep with the guardrail analogy, they put up a guardrail that will stop a car at full speed but if the car were to push REALLY hard REALLY slowly it would still go over the edge. Neat.
I'd say it's closer to putting up a single, relatively short guardrail in the location the car went off the road and saying cars can't leave the road in that (exact) manner anymore.It’s tantamount to putting a new highway guardrail in place in response to a recent crash of a compact car but failing to safeguard larger types of vehicles.