Supply-chain attack using invisible code hits GitHub and other repositories

josephhansen · Mar 13, 2026

Good old eval, always popping up in new fun ways. Seems like any code that uses eval should be flagged and auto-quarantined across the board

HiggsForce · Mar 13, 2026

Unicode is a never-ending source of security problems, from unnormalized UTF-8 to punycode lookalike domains, to this.

A lot of the problems come down to pervasive canonicalization throughout Unicode processing but with widespread (and often intentional) differences in how different systems canonicalize Unicode: some systems will consider two different Unicode characters as different things (for example, displaying one of them as empty), while other systems will canonicalize them to the same. This is always a source of security trouble.

Hadrian's Waller · Mar 13, 2026

The US alphabet??
So much for Rome and 2500 years.

HiggsForce · Mar 13, 2026

josephhansen said:
Good old eval, always popping up in new fun ways. Seems like any code that uses eval should be flagged and auto-quarantined across the board

eval is just a convenient example. This exploit works just fine in languages without it.

lurknomore · Mar 13, 2026

Hadrian's Waller said:
The US alphabet??
So much for Rome and 2500 years.

Standard since 2025 North of the Gulf of America. To be used exclusively for any DoW projects, with support from Greater Trump Numerals 0-9

YesAndNo · Mar 13, 2026

How hard would it be for Github, NPM, etc. to check for and flag these somehow? "Hey, this source code has invisible characters in it, look out!"

alansh42 · Mar 13, 2026

Seems like the editors should show □ for characters present but not available locally.

BTW, it's private use area and Moonshark has eaten the link explaining it.

lurknomore · Mar 13, 2026

Should editors, especially those used to check code commits, produce a black squares for unprintable/invisible characters?
edit : ninjaed, of course.

Deridex · Mar 13, 2026

Fyi, the "Public Use Areas" link is broken. It's a mish mash of the url to the article and a url to wikipedia.

chantries · Mar 13, 2026

Not being a developer, I wonder what legitimate uses are there for code points from this plane in source code at all?

Could sed or a more customized tool identify it and strip it out?

DaveSimmons · Mar 13, 2026

YesAndNo said:
How hard would it be for Github, NPM, etc. to check for and flag these somehow? "Hey, this source code has invisible characters in it, look out!"

Yes, that seems easy to automate. It would take a huge amount of processing to check millions of files, but worth it to remove the uncertainty.

For the future, pulls and check-ins could be scanned automatically. For the typo-squatters, their project pages could have a red banner added stating that their code includes invisible characters so be cautious about using it.

Package managers could scan new downloads and pop up warnings saying the package might be tainted, and ask for confirmation before adding it.

Rudde · Mar 13, 2026

I'm surprised that editors and terminals render those characters as blank, instead of inserting a □ missing/unsupported character or an � invalid/unrecognizable character. (Edit: or the object replacement character. Another choice would be 􏿮 .notdef character that is used when the font is missing the character.) A quick search shows that most programming languages support a wide range of unicode characters (E.g., the private use range in some version of Clang C++). Unicode has a technical standard for mitigating some of the problems: https://www.unicode.org/reports/tr55/

Personally, I avoid non-ascii characters in my code, with unicode strings the only exception.

Uncivil Servant · Mar 13, 2026

With the caveat that all my formal education on coding comes from law classes, I cannot wrap my head around how this weakness exists.

I cannot be the only person in the entire world who has had code fail to compile because of one little typo in a variable or because I forgot to end bracket a clause.

But invisible unicode? That compiles just fine? Sure, the problem could be ingenious hackers, it could be LLMs, or maybe the problem is compilers written by people so gullible they use phrases like "zero trust" unironically.

This just seems like such an obvious behavior to never implement, under any circumstances, I don't care how many drinks you've had you'll regret it in the morning. Not "whoopsie, how'd we let it run that code".

Seriously, does anyone in IT do policy & planning and view "chaotic neutral" as more of an orientation than an alignment?

plaidflannel · Mar 13, 2026

I'm a hobbyist rather than a professional developer, but my tools of choice allow me to change the typeface(s) used to display code (my longtime preference is Verdana). Would it not be relatively easy to create and use a typeface where all the "invisible" character codes become visible?

HiggsForce · Mar 13, 2026

Uncivil Servant said:
With the caveat that all my formal education on coding comes from law classes, I cannot wrap my head around how this weakness exists.

I cannot be the only person in the entire world who has had code fail to compile because of one little typo in a variable or because I forgot to end bracket a clause.

But invisible unicode? That compiles just fine? Sure, the problem could be ingenious hackers, it could be LLMs, or maybe the problem is compilers written by people so gullible they use phrases like "zero trust" unironically.

This just seems like such an obvious behavior to never implement, under any circumstances, I don't care how many drinks you've had you'll regret it in the morning. Not "whoopsie, how'd we let it run that code".

Seriously, does anyone in IT do policy & planning and view "chaotic neutral" as more of an orientation than an alignment?

Programming languages often contain string constants which can contain arbitrary unicode. There are many invisible unicode characters, and some of them are necessary for properly rending text in some human scripts. Blanket disallowing invisible characters would cause internationalization chaos.

Sirambrose · Mar 13, 2026

HiggsForce said:
eval is just a convenient example. This exploit works just fine in languages without it.

It looks like the language isn’t interpreting the extended characters as ascii directly. There is a loop that is modifying the characters to shift them back into the ascii range. A similar exploit might be able to write the decoded characters into a file and load the file, but it doesn’t look like a function definition could be directly created with the technique.

WXW · Mar 13, 2026

The code shown in the article does not contain the invisible characters, does it?

FelipeBG · Mar 13, 2026

josephhansen said:
Good old eval, always popping up in new fun ways. Seems like any code that uses eval should be flagged and auto-quarantined across the board

It does get flagged in npm if you click "Analyze security with Socket" -> "Alerts" from a package page. I don't know how long it's been doing that though.

HamHands_ · Mar 13, 2026

YesAndNo said:
How hard would it be for Github, NPM, etc. to check for and flag these somehow? "Hey, this source code has invisible characters in it, look out!"

It costs money to do that sort of scan. They could offer gratis to the open source projects, Microsoft certainly has enough money to do it. But it would be unlikely.

The solution from our (the dev) side is to incorporate scanning tools like SonarCloud into our workflow that does this for us. I havent checked for this exact issue but Google tells me there is a rule for flagging invisible characters.

woza · Mar 13, 2026

The article talks about "Public Use Areas", but the link goes to a Wikipedia page for "Private Use Areas" - typo in article?

HamHands_ · Mar 13, 2026

The best way to protect against the scourge of supply-chain attacks is to carefully inspect packages and their dependencies before incorporating them into projects.

This is true but nearly impossible at the moment. NPM (NodeJS) is a horrifying nest of dependencies on dependencies on dependencies. The LeftPad incident highlighted this so there's been a decent amount of wrangling of this problem with tools like npm audit but fundamentally, my node_modules folder is still going to be 100s of dependencies even if I only install, big name, "first party" packages (React/Angular/etc). I dont think this is going to be solved as IIRC the ethos of NPM was to make sharing code trivial and therefore you can get going faster by leveraging this huge repository of libraries instead of writing it yourself. Its possible things could get a little better as more stuff gets added to the standard library of JS. Eg: I used to have to install a 3rd party package like axios for making network requests with a nice API, but now fetch is standard in Browser and in Node (mostly) so I don't have to.

But still, security is one of the reasons (there are many) I usually recommend not using NodeJS/NPM where possible. Restrict its usage to the frontends alone. Though, unfortunately, the industry as a whole seems to be moving in the opposite direction: very tightly coupled front and backends, written exclusively in JS.

HamHands_ · Mar 13, 2026

plaidflannel said:
I'm a hobbyist rather than a professional developer, but my tools of choice allow me to change the typeface(s) used to display code (my longtime preference is Verdana). Would it not be relatively easy to create and use a typeface where all the "invisible" character codes become visible?

Should be a little easier than that. VSCode at least highlights these invisible characters. I'm sure other editors do as well. Granted, I haven't tested to see if the exploit code in this article shows up.

fellow human · Mar 13, 2026

josephhansen said:
Good old eval, always popping up in new fun ways. Seems like any code that uses eval should be flagged and auto-quarantined across the board

and Node et al filtering these characters out of strings.

floyd42 · Mar 13, 2026

The horrible thing is that I see a lot of JavaScript analytics packages just doing this in the open when the eval is around a visible encoded string. Always makes my head hurt seeing that.

nxg · Mar 13, 2026

Sirambrose said:
It looks like the language isn’t interpreting the extended characters as ascii directly. There is a loop that is modifying the characters to shift them back into the ascii range.

This seems correct, to me.

If I'm understanding the attack correctly (and I'm fairly sure I am), it consists of encoding malicious code in an otherwise entirely normal, or at least abnormal-but-legitimate, UTF-8 string within the code, which decodes to code which is then executed.

It is functionally equivalent to, say, ‘encrypting’ that malicious code in a ROT13 string, and including an inline ROT13 decoder, which is run on the string before executing it. The ‘encryption’ here is barely more sophisticated than that. The only difference – and it's a crucial one – is that any reviewer would surely notice a sodding big block of ROT13 code in a patch, whereas in this case (I would lay money) most editors and renderers would display the block of ‘encrypted’ code as an empty string, which is easy to miss.

The clever thing is that the editors are not malfunctioning when doing this, and any attempt to make them display the characters would potentially count as a bug. Even if a code-reviewer thought that the eval looked weird, they'd have to work through the decoder, and know an relatively unusual amount about Unicode, in order to work out what was going on.

The codepoints in question are actually not in any of the Unicode ‘private use areas’ (despite what the article suggests; and yes, I think it's ‘private use area’ that's intended, since there's no term ‘public use area’). The codepoint ranges U+FE00 to U+FE0F and U+E0100 to 0xE01EF are ‘variation selectors’. I'm moderately familiar with the Unicode spec and... I've never heard of them before! There's a handy Wikipedia page which tells us that they exist in order to do funky things to preceding CJK characters in selected east-Asian languages. You'd have to go head-first into the Unicode spec for the details (rather you than me), but I wouldn't be at all surprised if the required rendering behaviour for these in certain circumstances is... to show nothing.

That is, this apparently isn't exploiting any UTF-8 decoding bugs, or Unicode manipulation edge-cases. It seems quite likely that the rendering behaviour of these codepoints in strings is specified, and any editor which displayed the strings as other than empty ones might well be defective.

What a clever hack! Bastards.

w_mute · Mar 13, 2026

This technique has been used to obfuscate web based phishing and malware payloads for years. No LLMs needed.

norton_I · Mar 13, 2026

HiggsForce said:
eval is just a convenient example. This exploit works just fine in languages without it.

Sort of. But at least the code execution attack does require some way to execute strings. It could be eval() it could be system() or execve(). The PUA characters would be meaningless in the body of the program. If fact they are meaningless in the string literal until the attacker produced function translates them down to the basic ascii set. It's an innocuous looking function but the fact that the result is passed to eval should raise eyebrows if anyone looked at it.

Of course every non trivial language has some way to execute strings either internally like eval() or externally like system(). The point is not that "JavaScript is weak because it has eval" the point is that in any language these functions are well known and should have extra scrutiny applied.

clewis · Mar 13, 2026

WXW said:
The code shown in the article does not contain the invisible characters, does it?

As your resident BofH, there's an easy way to find out.

norton_I · Mar 13, 2026

Rudde said:
Personally, I avoid non-ascii characters in my code, with unicode strings the only exception.

In this case the problem is specifically that the non printing characters are in a Unicode string literal.

adamsc · Mar 13, 2026

Rudde said:
I'm surprised that editors and terminals render those characters as blank, instead of inserting a □ missing/unsupported character or an � invalid/unrecognizable character. (Edit: or the object replacement character. Another choice would be 􏿮 .notdef character that is used when the font is missing the character.) A quick search shows that most programming languages support a wide range of unicode characters (E.g., the private use range in some version of Clang C++). Unicode has a technical standard for mitigating some of the problems: https://www.unicode.org/reports/tr55/

Personally, I avoid non-ascii characters in my code, with unicode strings the only exception.

It depends on the system and tools you're using, which is what makes it nasty. For example, the Unicode private usage (planes 15 and 16) variant of the attack does not work on macOS (Terminal, popular editors, etc.) because those characters fall back to the .LastResort system font and display as a square with a question mark inside.

Problem solved, time to buy some AAPL before telling everyone to switch to Macs, right? Nope.

There are a lot of other characters in Unicode which do not render visibly because they're required not to. Other variations of this attack used things like the Mongolian variation selectors or vowel separator (U+180B-E, which may render or not depending on the active font!), joiners, the right-to-left / left-to-right embedding and override characters, etc.

There are already many tools and editors which will warn about suspicious mixing of language blocks which were added in response to phishers doing cute things with Cyrillic letters and those will often flag Mongolian formatting codes used in otherwise an non-Mongolian context but that's still not enough because technically you could just encode in binary using things like “ ” and “ ” (EN SPACE and EM SPACE, respectively) which are considered language-neutral.

There are some libraries like https://github.com/lirantal/anti-trojan-source and https://docs.astral.sh/ruff/rules/ambiguous-unicode-character-string/ which implement layers of rules looking for things like that but at some point this is also going to need to fall back on the detecting the way these things are misused. That should buy us some time because there really aren't cases where you need a gigantic string with no printing characters, but you'd also have to look for things like runs of paired RTL-LTR values which are syntactically meaningless but could be used to hide information in a not-entirely-empty string.

It's more than just eval‌(‌) but the act of decoding an embedded string constant and passing it into an open or execution function is inherently suspicious and we have a growing number of control-flow analysis tools which which we realistically have to put into every code review tool since you'd also want to catch things like using more normal Unicode to load a payload from a public blockchain or other hard-to-remove outside hosting service. Making every path which populates a variable passed to a sensitive function really prominent would be a good win for multiple reasons.

EDIT: in a truly hilarious bit of synchronicity, I have learned that if you put the literal eval followed by the opening parentheses into a comment here, the server will reject it but eval\N{ZERO WIDTH NON-JOINER}(\N{ZERO WIDTH NON-JOINER} will bypass that check. If you tried to exploit a Python system that way, it'd fail with an “SyntaxError: invalid non-printable character” exception but that would totally work against Node.js…

songmaster · Mar 13, 2026

There are still source code editors available that don’t support Unicode (UTF-8) at all. If you’re only expecting to see and use 7-bit ASCII they work just fine…

norton_I · Mar 13, 2026

chantries said:
Not being a developer, I wonder what legitimate uses are there for code points from this plane in source code at all?

Could sed or a more customized tool identify it and strip it out?

It would be trivial to strip them out. Since they appear in string literals, doing so would break any application which was encoding them to e.g. display as part of text to a user.

These are non-printing characters used to change how surrounding characters are rendered. It would probably be acceptable for a programming language to prohibit them in source code even in string literals. For instance if you want to use text strings with advanced Unicode control characters you would more likely load them from a language dependent template or database. You could make that a requirement in a specific language.

However, applying that unilaterally in existing code across multiple languages risks breaking stuff that's working. You would want to audit each instance (hopefully rare) for legitimate uses before deciding on that.

Wandering Monk · Mar 13, 2026

This seems like an extremely simple linting rule: if there are these invisible characters, fail the build.

Also, for NPM projects, add a flag (on by default after a couple releases) that strips out the invisible characters during the “build” (the vast majority of npm projects use something like webpack to “compile” the JS).

In both cases, if there’s a legitimate reason to have these invisible characters, just require them to have an annotation that effectively screams, “this string has invisible characters!”

Now that the cat is out of the bag, I expect it to be mitigated pretty quickly.

norton_I · Mar 13, 2026

Wandering Monk said:
Also, for NPM projects, add a flag (on by default after a couple releases) that strips out the invisible characters during the “build”

Ack! no! If you want to do this don't strip out the characters, just fail the build.

mexaly · Mar 14, 2026

Not the same as Ken Thompson's design for a trojanned compiler.
He could do it in plain old ASCII (or, horrors, EBCDIC).
See, "Reflections on Trusting Trust."
What's in your compiler?

darkowl · Mar 14, 2026

Given the average number of packages that gets dumped onto your computer when you npm install something beyond a hello world app ventures into the thousands, vetting every single one for invisible unicode and odd filenames seems... not particularly feasible. At least not for a human.

Chai T. Rex · Mar 14, 2026

nxg said:
Even if a code-reviewer thought that the eval looked weird, they'd have to work through the decoder, and know an relatively unusual amount about Unicode, in order to work out what was going on.

To totally work out what's going on, sure, but a good starting step is to replace eval with console.log or something like that.

Rudde · Mar 14, 2026

norton_I said:
In this case the problem is specifically that the non printing characters are in a Unicode string literal.

Evaluating string literals as code is a very well known vulnerability. I assumed the problem was similar to the following obfuscated C-code:
View: https://youtu.be/RMI5oT9U4vc?t=2m32s
where invisible characters are used in the code itself and not just in string literals.

arobert3434 · Mar 14, 2026

HiggsForce said:
eval is just a convenient example. This exploit works just fine in languages without it.

How? There's no compiled or interpreter that would treat those characters as ASCII directly. You need something to translate them and then a runtime that supports evaluation of code from data.

Supply-chain attack using invisible code hits GitHub and other repositories

Ars Centurion

Ars Scholae Palatinae

Ars Praetorian

Ars Scholae Palatinae

Ars Tribunus Militum

Smack-Fu Master, in training

Ars Praefectus

Ars Tribunus Militum

Ars Scholae Palatinae

Wise, Aged Ars Veteran

Ars Legatus Legionis

Ars Centurion

Ars Scholae Palatinae

Ars Praetorian

Ars Scholae Palatinae

Ars Centurion

Ars Scholae Palatinae

Smack-Fu Master, in training

Ars Centurion

Seniorius Lurkius

Ars Centurion

Ars Centurion

Ars Praefectus

Ars Scholae Palatinae

Ars Centurion

Ars Scholae Palatinae

Ars Praefectus

Ars Tribunus Militum

Ars Praefectus

Ars Praefectus

Smack-Fu Master, in training

Ars Praefectus

Ars Centurion

Ars Praefectus

Ars Scholae Palatinae

Ars Tribunus Militum

Wise, Aged Ars Veteran

Ars Centurion

Ars Scholae Palatinae