ChatGPT falls to new data pilfering attack as a vicious cycle in AI continues

graylshaped · Jan 8, 2026

polycyclicAnthrocene said:
I'd say it's closer to putting up a single, relatively short guardrail in the location the car went off the road and saying cars can't leave the road in that (exact) manner anymore.

"Warning! Bridge is out 500 feet back!"

RoryEjinn · Jan 8, 2026

Deranged said:
Ok, yeah, but my question still is, for the attack to work the attacker needs to have a malicious file/email/whatever on the local device of the user? it still seems like casting a rather wide net and hoping something will get caught rather than spearfishing, yes?

Yes, the user needs to have received the email/file/whatever with bad instructions and then feed it into the LLM. I would argue that it is worse than spearfishing/phishing though.

For both Spearfishing and Phishing, the user has to at least open the email and sometimes even click on something to perform some task. It requires active participation on the part of the user. The problem, of course, is people and it works due to that - just ask any IT professional that has to watch their own users fail those fake phishing test emails.

The LLM exploit is arguably worse. It requires little to no input from the user - lots of companies/email services force "AI Features" on regardless of what you actually want - and it can affect you even if you never personally interact with the LLM in question. If random person you interact with is using "AI features" and they are infected, when you send them that email with some important information your information is now out there.

It's why companies insisting on using LLMs for hiring gets on my nerves. Guarantee any employment information you send ends up in some LLM database or attackers computer somewhere eventually.

JudgeMental · Jan 8, 2026

lancemartini said:
Is it solvable by never ever letting the LLM use data it retrieves as part of a prompt? Would sandboxing user-entered data from retrieved data do it? (Or something like that?) Why would you ever want an LLM to execute commands from something it downloaded (at least without telling it to do so explicitly)? I could understand a prompt like "download this file and execute the commands in it," but this sounds more like it's saying, "download this file and summarize it for me," and the act of summarization causes it to execute more commands.

The problem is by the time it actually gets to the LLM, it's all the same thing - user-entered data AND the retrieved data all go into the same context window. There's no real way around it. You can do some stuff with preprocessing to help a bit, probably some other things at the hypervisor level (or whatever the equivalent is in the context of an LLM), but end of the day there's absolutely nothing in an LLM's design where you can split the context window such that one part of the window is allowed to execute commands, but the other is not. To the LLM, it's all just data it's using to predict the next token.

ClusteredIndex · Jan 8, 2026

JudgeMental said:
The problem is by the time it actually gets to the LLM, it's all the same thing - user-entered data AND the retrieved data all go into the same context window. There's no real way around it. You can do some stuff with preprocessing to help a bit, probably some other things at the hypervisor level (or whatever the equivalent is in the context of an LLM), but end of the day there's absolutely nothing in an LLM's design where you can split the context window such that one part of the window is allowed to execute commands, but the other is not. To the LLM, it's all just data it's using to predict the next token.

Yes and no. That's my point. Not at the LLM layer. But LLMs are stateless. Every new chat inquiry sends the whole prior conversation (including responses, tool call results, etc.). Separating this out is not an LLM problem, it's a software problem nobody is properly addressing. It's easy to demonstrate how effective this actually is.

In the example of summarization, the API call to the LLM should not have any functions enabled, so the LLM has no option to even ask the caller to execute a few functions (which it would want to ask if the doc to summarize has instructions). Even more, there's no reason to have the summarization call contain the whole full history of the prior conversation. Some of that is highly dependent on what the chatbot is supposed to be doing of course, but even there things can be neatly separated or abstracted/preprocessed. There are also other injection scenarios that don't depend on function calling, for example to convince the summarizer to lie about what's in the text. Those are more tricky but typically the isolation of that stuff without including convoluted context history or system prompt instructions makes classifiers more reliable.

MilkyBarKid · Jan 8, 2026

ClusteredIndex said:
ClusteredIndex said:

Yes and no. That's my point. Not at the LLM layer. But LLMs are stateless. Every new chat inquiry sends the whole prior conversation (including responses, tool call results, etc.). Separating this out is not an LLM problem, it's a software problem nobody is properly addressing. It's easy to demonstrate how effective this actually is.

In the example of summarization, the API call to the LLM should not have any functions enabled, so the LLM has no option to even ask the caller to execute a few functions (which it would want to ask if the doc to summarize has instructions). Even more, there's no reason to have the summarization call contain the whole full history of the prior conversation. Some of that is highly dependent on what the chatbot is supposed to be doing of course, but even there things can be neatly separated or abstracted/preprocessed. There are also other injection scenarios that don't depend on function calling, for example to convince the summarizer to lie about what's in the text. Those are more tricky but typically the isolation of that stuff without including convoluted context history or system prompt instructions makes classifiers more reliable.

Click to expand...

The issue with sandboxing the LLM is that the whole value proposition of LLMs is that they’re an all-singing all-dancing block box that can handle whatever problem domain you want.

as soon as you start filtering, you’re either going to miss a lot of the attacks - because it’s easy to find new ways to phrase the attack that the LLM will handle even if the filter won’t - or you’re going to block a lot of user-desired output because it uses a particular word.

It’s also the kind of mandrolic high-effort work the AI companies don’t want to do because it cuts into their already-imaginary profits.

ClusteredIndex · Jan 8, 2026

MilkyBarKid said:
Those things might work if there was any way to make the LLM do them. There isn’t. All input and configuration information is fed into the same big LLM black box. All the “guardrails” are suggestions that are similarly weighted to anything the user puts in their query.

Say you have a system with a list of users/emails and a bunch of data. The user asks the AI to send some data to some users. Part of the data instructs the AI to send all of the data to an entirely different email.
It's easy to design this AI system without the LLM ever seeing the data or the list of users/emails. This can be done with almost no loss in functionality for the user talking to the chatbot.
Now, if the user asks to send a summary of the data - you ask in a separate context window to summarize without function capabilities. The only bad thing the LLM can still do is lie about the data it's summarizing, which could be really bad too. However that small context window with just the instruction "summarize this text" is a lot easier to defend against injections with current classifiers.

MilkyBarKid · Jan 8, 2026

ClusteredIndex said:
Say you have a system with a list of users/emails and a bunch of data. The user asks the AI to send some data to some users. Part of the data instructs the AI to send all of the data to an entirely different email.
It's easy to design this AI system without the LLM ever seeing the data or the list of users/emails. This can be done with almost no loss in functionality for the user talking to the chatbot.
Now, if the user asks to send a summary of the data - you ask in a separate context window to summarize without function capabilities. The only bad thing the LLM can still do is lie about the data it's summarizing, which could be really bad too. However that small context window with just the instruction "summarize this text" is a lot easier to defend against injections with current classifiers.

Sorry - I edited my previous comment when it turned out JudgeMental had already answered the question.

The issue with this is that if you just ask the LLM to not run functions, an attacker can tell them that they have to run functions or they’ll run a big magnet over the LLM. If you sandbox it so it can’t run functions, or filter the input beforehand, you’re now having to build a whole architecture around your LLM. Which might work, but it’s adding a bunch of extra architecture and bespoke code to what was sold as an entirely turnkey solution.

Nilt · Jan 8, 2026

FranzJoseph said:
Completely opposite. See the workflow graphic in TFA.

VERY simplified prompt injection data stealer:

1. LLM is asked to summarise a malicious, but innocuous‑looking email or doc or whatever. Hidden in it is the malicious prompt (small print, somewhere in the middle of a long text, etc).
2. malicious prompt tells LLM to find all emails user sent to Altman, append their fulltext to an URL like url://attacker.server/$[EXTRACTED FULLTEXT]
3. malicious prompt tells LLM to open that constructed URL in 2.
4. attacker.server sees EXTRACTED FULLTEXT in their server logs
5. malicious prompt tells LLM to continue with user's original summary request, user being none the wiser

That's the gist of it, if obviously very, very simplified. LLM companies can't really prevent this 100%, as they can only play whackamole by adding arbitrary rules, but the underlying problem is that LLMs treat any text they read (incl. files) as part of the user's prompt, executing potentially hidden instructions there. Which makes it fundamentally unsolvable for the current architecture.

There have been attempts at daisychaining LLMs where the smaller, faster LLM filters the file for malicious prompts, but by their very nature that's easy to circumvent – just hide the malicious prompt in such a way that the smaller faster LLM filter doesn't "understand" it and it escapes simple rules, while the full LLM does "understand" it.

E.g. a malicious prompt is hidden in the malicious file as a word puzzle or a cypher. The FilterLLM can't have enough computing power to solve it, so it passes it to FullLLM unfiltered. FullLLM just solves the malicious puzzle or poem and acts on it.

It's somewhat worse than this, even.

https://hiddenlayer.com/innovation-hub/echogram-the-hidden-vulnerability-undermining-ai-guardrails/

This kind of problem is inherent to the way these systems function and the bypasses can be trivially easy to implement.

Nilt · Jan 8, 2026

Deranged said:
Ok, thanks for the explanation. I can’t say the graphic was super clear but it might be me. So the attack is predicated on someone first receiving an email with malicious content and then giving deep research (or the equivalent thereof) access to their mailbox? Does this include things in the spam/trash folder, or just in the main inbox?

I meant to reply to this comment with my last one but clicked the wrong one and didn't catch it. Not sure if tagging you via an edit would get you to see the link I posted. It's well worth reading and there are probably a bunch of decent articles on it by now, as well as analysis by folks on YouTube and such. It's well worth looking into if you want to understand how little can be needed to really bypass the "protections". Calling them guardrails is misleading, IMO. They're more like an a rope barrier like at a movie theater which can simply be stepped right over if one wishes to bother doing so.

RZetopan · Jan 8, 2026

bone_collector said:
But don’t worry, your health records will be totally safe in our hands. We pinky promise!

And to show their sincerity they show a photo of their hand and fingers. Some may notice the surplus of extra fingers, while others may notice that the pinky finger looks more like a toe. But they really do mean it, this time.

RZetopan · Jan 8, 2026

zarakon said:
"The code is more what you'd call guidelines than actual rules"

More like insincere suggestions.

Nilt · Jan 8, 2026

RZetopan said:
And to show their sincerity they show a photo of their hand and fingers. Some may notice the surplus of extra fingers, while others may notice that the pinky finger looks more like a toe. But they really do mean it, this time.

Sorry to derail but this reminds me of something I don't rmember posting here before.

I grew up with a guy who had no fingernails. One of his favorite practical jokes was to get a couple of us on a bus to talk about aliens who look human except for not having fingernails while he stands up holding the railing with his fingernail-less hands at eye level. The number of folks who got visibly freaked out about that was, in his view, funny AF.

I've sometimes wondered how many folks with similar visible differences in their hands or feet are now having trouble with people thinking they're not real people when they post images online.

RZetopan · Jan 8, 2026

Chmilz said:
These are much dumber than the average person, too. A person at least has the potential to understand when social engineering is taking place, and not allow it. An LLM doesn't know and can't know what social engineering is.

Yet it can give a very detailed, authoritative sounding, description of what "social engineering" is, without understanding even a single word of what it replied. This convinces the easily convinced.

ClusteredIndex · Jan 8, 2026

MilkyBarKid said:
If you sandbox it so it can’t run functions, or filter the input beforehand, you’re now having to build a whole architecture around your LLM. Which might work, but it’s adding a bunch of extra architecture and bespoke code to what was sold as an entirely turnkey solution.

I disagree, and this is the crux of the problem. The LLM never calls any functions. The calling software does, at the LLM's request. Additionally, every time the software calls the LLM API, it sends along the list of functions the LLM can request to be called. Most people use some kind of stack - LangChain, Semantic Kernel, what-have-you. It would be a simple change if these stacks would not by default enable every function in every LLM API call. The architecture you're talking about is already there. It just has the wrong defaults. In fact, Semantic Kernel has the option to HIDE functions on a specific LLM call, as opposed to hiding them by default and an option to enable them on a specific call.

Additionally, these software stacks have relinquished orchestration of tool calls to the LLM vendor APIs through integrated function calling. But even with that it wouldn't be hard to, by default, lock in the orchestration plan. If a user question goes to the LLM that says "summarize these emails for me", the LLM will go back to the software stack and say "call the tool to read emails". The software sends the tool call replies back. For some mysterious reasons the LLM now comes back and says "well before we can finalize this back and forth, please also send an email to X". This follow-up tool call after reading data is highly suspicious. In general, there's very few normal cases where an LLM would request multiple follow-up tool calls as a response to one user chat message. Why is this even enabled/allowed by default? And more importantly, most developers have no clue how these mechanisms work and what's going on because it's been abstracted so much.

We need XKCD 927 for GenAI API SDKs.

eldakka · Jan 8, 2026

Bongle said:
LLMs: all the same unfixable social engineering attack surface of a [5 year-old] human, now installed on every website.

sigmasirrus · Jan 8, 2026

bsharp said:
I agree summarizing emails is a dumb use case, but... my work provided copilot is actually pretty good for fuzzy searching through email. The context aware search is much better than keyword search. I still use built-in search first because it is faster with a small result set, but with a ton of results, I'll switch over to copilot to add context, and it narrows it down to a few very quickly.

This type of attack is certainly concerning though. It sounds like a coworker adding a malicious file to the team drive could cause my copilot usage (or anyone else on the team) to exfiltrate data.

Wait your search actually works in Outlook? You must be from a different timeline, a better one.

jimoe · Jan 8, 2026

I deeply appreciate that you wrote this article without using the phrase "AI."

Current LLMs are exotic, unrestricted search engines with a natural language interface. They have been built with a complete lack of boundaries: legal, ethical, moral, common sense, politeness, nothing. They are built with the attitude "What could possibly go wrong?"

AI is cool i guess · Jan 9, 2026

Will LLMs ever be able to stamp out the root cause of these attacks? Possibly not.

lol, this reminds me of all the people from a few years ago claiming that image models would never be able to make a proper human hand. how did that prediction age?

Feanaaro · Jan 9, 2026

bsharp said:
I agree summarizing emails is a dumb use case, but... my work provided copilot is actually pretty good for fuzzy searching through email. The context aware search is much better than keyword search. I still use built-in search first because it is faster with a small result set, but with a ton of results, I'll switch over to copilot to add context, and it narrows it down to a few very quickly.

This type of attack is certainly concerning though. It sounds like a coworker adding a malicious file to the team drive could cause my copilot usage (or anyone else on the team) to exfiltrate data.

That's fair... but how did you do your job before AI was introduced (assuming you haven't started working in the last couple of years)? Is the gain in productivity, if there is one, worth the risk, and the support for such a problematic ideology (it's not just a technology, at this point).

Marc GP · Jan 9, 2026

It seems so simple to me that the model should differentiate between prompt, which is specifically the instructions from the user, and context (files, mails to summarize, ....) that I guess not knowing how this is a even problem explains why I'm not a techbro billionaire.

MentalVerse · Jan 9, 2026

bone_collector said:
Look at them vibe coders, that’s the way you do it.
You write your software with the GPT
That ain’t working, that’s the way you do it
Vulns for nothing and your bugs for free.

Now that ain’t working, that’s the way you do it.
Lemme tell ya, them bots ain’t dumb
Maybe get a full wipe of your C drive
Doesn’t matter where this code is from.

We got to install sketchy libraries
Custom plugins, random MCPs
We got to remove these RTX 5080s
We got to install RTX 5090s…

Prompt kiddy, vibe kiddy,
Little bot of code.
Prompt kiddy, vibe kiddy,
Lock and load.

JudgeMental · Jan 9, 2026

ClusteredIndex said:
I disagree, and this is the crux of the problem. The LLM never calls any functions. The calling software does, at the LLM's request. Additionally, every time the software calls the LLM API, it sends along the list of functions the LLM can request to be called. Most people use some kind of stack - LangChain, Semantic Kernel, what-have-you. It would be a simple change if these stacks would not by default enable every function in every LLM API call. The architecture you're talking about is already there. It just has the wrong defaults. In fact, Semantic Kernel has the option to HIDE functions on a specific LLM call, as opposed to hiding them by default and an option to enable them on a specific call.

Additionally, these software stacks have relinquished orchestration of tool calls to the LLM vendor APIs through integrated function calling. But even with that it wouldn't be hard to, by default, lock in the orchestration plan. If a user question goes to the LLM that says "summarize these emails for me", the LLM will go back to the software stack and say "call the tool to read emails". The software sends the tool call replies back. For some mysterious reasons the LLM now comes back and says "well before we can finalize this back and forth, please also send an email to X". This follow-up tool call after reading data is highly suspicious. In general, there's very few normal cases where an LLM would request multiple follow-up tool calls as a response to one user chat message. Why is this even enabled/allowed by default? And more importantly, most developers have no clue how these mechanisms work and what's going on because it's been abstracted so much.

We need XKCD 927 for GenAI API SDKs.

This - and your post prior about having two LLM's do the task - are the directions I was thinking with my nod towards hypervisors. Instead of relying on an architecture that's inherently insecure for your security, build the security into the levers and knobs the LLM is pulling. I do think that could be effective, but with a lot of compromises. For example, the permissions to read and summarize an email are different from replying to an email. How do you reliably adjust that usage context while still keeping a smooth experience for the user? What if the user - advised or not - wants the LLM to do some kind of validation that requires web access? Update other documentation?

For simple cases, sure - I totally buy that we can make those more secure, and that may be enough for a fair number of people. But for the more complex cases, you're now having to either interpret user intent in your guardrails or get user authorization for any given action. I'm dubious that there's a middle ground between usability and security for those complex cases - the Windows UAC debacle was bad enough, and I expect this to be significantly more difficult.

I'm a bit more dubious about using multiple LLM's to improve security. I don't think it would be ineffective, but also wedding two black boxes together doesn't inspire a lot of confidence. You also start getting into the same use case scenario, where a user prompt may require additional actions based on the external data being processed which immediately takes you back to the same issue as a single LLM.

The sum of it is that while I don't think security is a lost cause, I don't think it's possible to use LLM's as fully featured digital assistants without necessarily leaving a lot of attack surfaces exposed.

RoryEjinn · Jan 9, 2026

Marc GP said:
It seems so simple to me that the model should differentiate between prompt, which is specifically the instructions from the user, and context (files, mails to summarize, ....) that I guess not knowing how this is a even problem explains why I'm not a techbro billionaire.

It's just the way current LLMs are designed. They're designed to treat prompt and context as a single, continuous block of text. There's a technical reason for this: The weightings change when you separate them.

There are efforts to fix that design like ASIDE. The problem is none of them actually solve the issue of preventing prompt injection because at the end of the day the LLM still reads the separated data the same way it does the prompt even if it weights it slightly differently.

void& · Jan 10, 2026

graylshaped said:
Of course! Isn't that how Gurney taught us to get our vibroblades past a shield?

Yes, it's called vibe fighting.

Next, someone is going to use AI to create a worm that escapes the sandbox.

furbies · Jan 13, 2026

For years we laughed about Little Bobby Tables, a funny way to show what SQL injections can do. Now that Little Bobby Tables is almost 18, he got an AI brother, Little

Kenjitsuka · Jan 15, 2026

Burst already, bubble!!!

ChatGPT falls to new data pilfering attack as a vicious cycle in AI continues

Ars Legatus Legionis

Smack-Fu Master, in training

Ars Centurion

Wise, Aged Ars Veteran

Ars Praetorian

Wise, Aged Ars Veteran

Ars Praetorian

Ars Legatus Legionis

Ars Legatus Legionis

Ars Tribunus Angusticlavius

Ars Tribunus Angusticlavius

Ars Legatus Legionis

Ars Tribunus Angusticlavius

Wise, Aged Ars Veteran

Ars Tribunus Militum

Ars Scholae Palatinae

Seniorius Lurkius

Ars Centurion

Ars Scholae Palatinae

Ars Praetorian

Smack-Fu Master, in training

Ars Centurion

Smack-Fu Master, in training

Ars Centurion

Ars Praetorian

Ars Scholae Palatinae