Unicode that's invisible to the human eye was largely abandoned—until attackers took notice.
See full article...
See full article...
eval should be flagged and auto-quarantined across the boardGood old eval, always popping up in new fun ways. Seems like any code that usesevalshould be flagged and auto-quarantined across the board
eval is just a convenient example. This exploit works just fine in languages without it.Standard since 2025 North of the Gulf of America. To be used exclusively for any DoW projects, with support from Greater Trump Numerals 0-9The US alphabet??
So much for Rome and 2500 years.
Yes, that seems easy to automate. It would take a huge amount of processing to check millions of files, but worth it to remove the uncertainty.How hard would it be for Github, NPM, etc. to check for and flag these somehow? "Hey, this source code has invisible characters in it, look out!"
Programming languages often contain string constants which can contain arbitrary unicode. There are many invisible unicode characters, and some of them are necessary for properly rending text in some human scripts. Blanket disallowing invisible characters would cause internationalization chaos.With the caveat that all my formal education on coding comes from law classes, I cannot wrap my head around how this weakness exists.
I cannot be the only person in the entire world who has had code fail to compile because of one little typo in a variable or because I forgot to end bracket a clause.
But invisible unicode? That compiles just fine? Sure, the problem could be ingenious hackers, it could be LLMs, or maybe the problem is compilers written by people so gullible they use phrases like "zero trust" unironically.
This just seems like such an obvious behavior to never implement, under any circumstances, I don't care how many drinks you've had you'll regret it in the morning. Not "whoopsie, how'd we let it run that code".
Seriously, does anyone in IT do policy & planning and view "chaotic neutral" as more of an orientation than an alignment?
It looks like the language isn’t interpreting the extended characters as ascii directly. There is a loop that is modifying the characters to shift them back into the ascii range. A similar exploit might be able to write the decoded characters into a file and load the file, but it doesn’t look like a function definition could be directly created with the technique.evalis just a convenient example. This exploit works just fine in languages without it.
It does get flagged in npm if you click "Analyze security with Socket" -> "Alerts" from a package page. I don't know how long it's been doing that though.Good old eval, always popping up in new fun ways. Seems like any code that usesevalshould be flagged and auto-quarantined across the board
It costs money to do that sort of scan. They could offer gratis to the open source projects, Microsoft certainly has enough money to do it. But it would be unlikely.How hard would it be for Github, NPM, etc. to check for and flag these somehow? "Hey, this source code has invisible characters in it, look out!"
This is true but nearly impossible at the moment. NPM (NodeJS) is a horrifying nest of dependencies on dependencies on dependencies. The LeftPad incident highlighted this so there's been a decent amount of wrangling of this problem with tools likeThe best way to protect against the scourge of supply-chain attacks is to carefully inspect packages and their dependencies before incorporating them into projects.
npm audit but fundamentally, my node_modules folder is still going to be 100s of dependencies even if I only install, big name, "first party" packages (React/Angular/etc). I dont think this is going to be solved as IIRC the ethos of NPM was to make sharing code trivial and therefore you can get going faster by leveraging this huge repository of libraries instead of writing it yourself. Its possible things could get a little better as more stuff gets added to the standard library of JS. Eg: I used to have to install a 3rd party package like axios for making network requests with a nice API, but now fetch is standard in Browser and in Node (mostly) so I don't have to. Should be a little easier than that. VSCode at least highlights these invisible characters. I'm sure other editors do as well. Granted, I haven't tested to see if the exploit code in this article shows up.I'm a hobbyist rather than a professional developer, but my tools of choice allow me to change the typeface(s) used to display code (my longtime preference is Verdana). Would it not be relatively easy to create and use a typeface where all the "invisible" character codes become visible?
and Node et al filtering these characters out of strings.Good old eval, always popping up in new fun ways. Seems like any code that usesevalshould be flagged and auto-quarantined across the board
This seems correct, to me.It looks like the language isn’t interpreting the extended characters as ascii directly. There is a loop that is modifying the characters to shift them back into the ascii range.
eval looked weird, they'd have to work through the decoder, and know an relatively unusual amount about Unicode, in order to work out what was going on.evalis just a convenient example. This exploit works just fine in languages without it.
As your resident BofH, there's an easy way to find out.The code shown in the article does not contain the invisible characters, does it?
Personally, I avoid non-ascii characters in my code, with unicode strings the only exception.
I'm surprised that editors and terminals render those characters as blank, instead of inserting a □ missing/unsupported character or an � invalid/unrecognizable character. (Edit: or the  object replacement character. Another choice would be .notdef character that is used when the font is missing the character.) A quick search shows that most programming languages support a wide range of unicode characters (E.g., the private use range in some version of Clang C++). Unicode has a technical standard for mitigating some of the problems: https://www.unicode.org/reports/tr55/
Personally, I avoid non-ascii characters in my code, with unicode strings the only exception.
eval\N{ZERO WIDTH NON-JOINER}(\N{ZERO WIDTH NON-JOINER} will bypass that check. If you tried to exploit a Python system that way, it'd fail with an “SyntaxError: invalid non-printable character” exception but that would totally work against Node.js…Not being a developer, I wonder what legitimate uses are there for code points from this plane in source code at all?
Could sed or a more customized tool identify it and strip it out?
Also, for NPM projects, add a flag (on by default after a couple releases) that strips out the invisible characters during the “build”
npm install something beyond a hello world app ventures into the thousands, vetting every single one for invisible unicode and odd filenames seems... not particularly feasible. At least not for a human.To totally work out what's going on, sure, but a good starting step is to replaceEven if a code-reviewer thought that theevallooked weird, they'd have to work through the decoder, and know an relatively unusual amount about Unicode, in order to work out what was going on.
eval with console.log or something like that.Evaluating string literals as code is a very well known vulnerability. I assumed the problem was similar to the following obfuscated C-code:In this case the problem is specifically that the non printing characters are in a Unicode string literal.
How? There's no compiled or interpreter that would treat those characters as ASCII directly. You need something to translate them and then a runtime that supports evaluation of code from data.evalis just a convenient example. This exploit works just fine in languages without it.