How one YouTuber is trying to poison the AI bots stealing her content

Feliciti · Jan 30, 2025

schnackenpfefferhausen said:
HD-DVD.. obscure?!

Sir, you have offended my honour.
I challenge you to a dual.

Look! It's the one person who bought into HD-DVD :3

"Obese Chess" · Jan 30, 2025

starglider said:
I sympathize with content creators, although I selfishly hope that this doesn't catch on. One of my absolute favorite features of Kagi is the "summarize this youtube video" feature, which "reads" the titling. It's absolutely incredible for when you want a quick answer to something, and the best answer is buried in some 20 minute long YT video with 19.5 minutes of "sponsored content" and the actual answer consists of three words at index 17:31.

"What is the command to get the CPU temperature on my Raspberry Pi?"

"CPU temperatures are HOT HOT HOT these days! I'm always trying to find my CPU temperature, and every time I do, I see that it's really high! That's why I'm so delighted to be sponsored today by Coolermaster, the best coolers! Coolermaster is the best! [10 more minutes]. About a decade ago, I began my journey to finding out how to measure CPU temperatures. I once hiked up to K2's peak to see what CPU temperatures were at 8,000 meters! On the way, Black Diamond crampons were my go-to crampons, and I thank them for sponsoring today's video! You can check the temperature by typing in vcgen . . . [youtube ad interrupts]."

in good faith: why would you use "summarize this youtube video" to get an answer to that question? Presumably Kagi would just point you towards any of the easily-available and freely-accessible documentation for the Raspberry Pi instead of finding a YouTube video and then giving you the option to summarize it like this.

schnackenpfefferhausen · Jan 30, 2025

JonBee said:
Look! It's the one person who bought into HD-DVD :3

.. only when it was clear HD-DVD lost the format war, and there was a fire sale on the drive and movies..

gotosoda · Jan 30, 2025

Look for the big G to start detecting this process and suspending the accounts in 3...2...1...

fellow_traveler · Jan 30, 2025

murty said:
I linked it in the comments of the AI poisoning article the other day, partially in hopes it would be picked up.

I saw the video via your link, thanks for sharing it.

Mechjaz said:
How sadly hilariously bleak it will be when OpenAI files a DMCA takedown request to stop AI poisoners from spoiling OpenAI's theft of their work.

And they say irony is dead.

Windhaven · Jan 30, 2025

tcowher said:
On an side comment, whats to stop some country like east bumfarkistan from changing their laws to allow training on publicly accessible data and AI companies moving the training operations to that country, sort of how some countries dodge taxes by having subsidies in certain tax friendly countries.

A few weeks ago, nothing other than power limitations, but those could be solvable. Now? I doubt East Bumfarkistan is on good enough terms with the US to get the GPUs for a training operation over there after Biden announced accelerator export restrictions, but that could be less of an issue with optimizations like what DeepSeek are doing (which actually were partially caused by a US ban on accelerators).

Maestro4k · Jan 30, 2025

YouTube's subtitle engine is really powerful. It can't do everything you can do with ASS and AegiSub (especially positioning subs frame by frame to track video), but the subs this YouTuber did for the Call of the Night anime opening are a great demonstration of how powerful it is. I didn't know it could do stuff like the falling text before seeing this. You have to turn captions on to see them:

View: https://www.youtube.com/watch?v=L96VbQ9ytWk

anon11472 · Jan 30, 2025

Artemis-kun said:
Huh. I remember when a fabsub group once used this exact same method to force VLC to fix its subtitle support, which at the time was notoriously bad. I had a friend with a Mac for which only VLC was available to handle subs at the time, and playing files with poisoned subs would cause it to crash, while MPC using the community codec pack would play the files just fine.

I love this anecdote because as a MPC fan I would always encounter VLC fans who would staunchly deny even the implication that maybe there was a better video player out there. I've long since passed the days where I care but I will say only one person I convinced to at least try MPC ever changed back to VLC. And it was for an incredibly stupid reason.

The default splash screen at the time for MPC featured an anime character and he felt that was "unprofessional". I pointed out that more "professional" splash images could be selected in the settings with some default ones looking positively corporate.

He was unconvinced and went back to his orange cone.

zogus · Jan 31, 2025

schnackenpfefferhausen said:
HD-DVD.. obscure?!

Sir, you have offended my honour.
I challenge you to a dual.

Calm down, sir, I’m sure it wasn’t intentional. What’s HD-DVD, by the way?

you goddamn idiot. · Jan 31, 2025

Windhaven said:
A few weeks ago, nothing other than power limitations, but those could be solvable. Now? I doubt East Bumfarkistan is on good enough terms with the US to get the GPUs for a training operation over there after Biden announced accelerator export restrictions, but that could be less of an issue with optimizations like what DeepSeek are doing (which actually were partially caused by a US ban on accelerators).

Well, several countries (Singapore and Japan) do have a text and data mining exemption in their copyright laws that read like they allow for training on publicly-available data, even for commercial purposes, and both are not really East of Bumfuck. Japan is in the most-favoured list for AI chips, and Singapore is the second tier.

jdale · Jan 31, 2025

Thom Kidd said:
While I adore this approach, personally, I can already hear the disingenuous pushback replies from LLM makers: "How dare content creators poison our learning models?!"

Well, maybe don't build your learning models on EVERYONE ELSE'S hard work and then treat it as your own?

"But... how will we beat competitors to market if we have to do the time-consuming initial legwork? Our sales director told the engineering folks to just stea... er, scrape everything on the internet."

It's not necessarily targeted at the creators of the LLMs. It's targeted at the people who are using them.

A lot of usage of LLMs is highly derivative. You want to create a ton of webpages, but you don't have content. So you crawl existing webpages, then ask your LLM to generate new webpages on the same topics. Poof, you created content. It's not really useful content, but who cares, it attracts hits, which gets eyeballs on the ads you are serving, which makes you money.

The same thing works with videos. There's money to be made on YouTube, but actually creating videos is all this annoying work. Why not use your LLM to analyze existing videos that get a lot of views, and then magically create videos on the same topics. Sure, it's grossly derivative, adds nothing to the world, and makes it harder for people to find real content, but so what? It makes you money.

The people actually creating LLMs may not care about this business at all. It's sort of a mixed bag for them. On the one hand, people are using their tools. On the other, it makes them look bad. They aren't likely to get involved here.

step21 · Jan 31, 2025

Obese Chess said:
in good faith: why would you use "summarize this youtube video" to get an answer to that question? Presumably Kagi would just point you towards any of the easily-available and freely-accessible documentation for the Raspberry Pi instead of finding a YouTube video and then giving you the option to summarize it like this.

Or probably asking an llm should work too.

Psyborgue · Jan 31, 2025

Windhaven said:
but that could be less of an issue with optimizations like what DeepSeek are doing

It’s entirely possible they weren’t entirely honest about how that model was trained, knowing what the reaction would be. It’s hurt the US AI industry and given people unrealistic expectations. Well played, if so.

maxoakland · Jan 31, 2025

StarturnCaproc said:
I'm glad her video is getting more widespread coverage!

Although this probably means the primary method won't live long, her idea of clogging it with so, so much garbage that it becomes way too expensive to compute will probably endure.

Unless they simply make it so the bot ignores everything on the sides...

They will always find a solution and so will we. The key is to keep coming up with new solutions. As many as possible. Make it hard and even more expensive to steal our work

JoHBE · Jan 31, 2025

AI is the "blessing" that keeps delivering curse after curse.I wish more "techies" would wake up to the multiple disasters-in-the-making, hiding behind the alluring and fascinating technological aspects.

The Internet started with incredibly "lean" tech, and maximum openness and accessibi'ity. Firewalls, filters, anti-malware, encyption, anti-DOS, ads... have turned that into an obese monster where 95% of the energy goes to garbage and ballast orbitng the actual useful stuff. Because we can't have nice things.

Now with the AI-acid eating through everything, soon we'll have to TALK to each other in code, or something?

omarsidd · Jan 31, 2025

ASSet to the AI resistance, she is.

moriad · Jan 31, 2025

This is nice and all. But what's the point when YouTube publishes a transcript (based on the sound) right along with the video?

She says of whisper AI that "most people wouldn't bother with it". Of course not. They don't have to. YouTube does it for them.

Edit: And Google will probably summarize that very transcript if you ask it to.

Solution: Speak your lines in Old English. YouTube treats it as English, but it really is nonsensical when transcribed that way. You'll have to rely on people to read the subtitles, however; which they won't. Unless they're into old dead languages.

NZSteel · Jan 31, 2025

kragg said:
I wonder if you could use a similar technique against OpenAI's Whisper using ultrasound.

I doubt it. It would using filters to only concentrate on the bands of sound associated with human voice, and aggressively filtering out everything else

daemonios · Jan 31, 2025

JonBee said:
Look! It's the one person who bought into HD-DVD :3

Nope. I bought a dual HD-DVD / Blu-Ray drive for my PC at the time. HD-DVD was clearly superior. Blu-Ray kept breaking due to javascript and idiotic software updates that removed backwards compatibility with certain titles, and the PC playback software was horrible, expensive, and poorly maintained. Still, Blu-Ray won out in the end. Fortunately we're mostly free of physical media nowadays.

whiteknave · Jan 31, 2025

schnackenpfefferhausen said:
HD-DVD.. obscure?!

Sir, you have offended my honour.
I challenge you to a dual.

A dual slot HD-DVD burner/player? Yeah, that would be a challenge to find.

you goddamn idiot. · Jan 31, 2025

moriad said:
This is nice and all. But what's the point when YouTube publishes a transcript (based on the sound) right along with the video?

That's answered in the video; the uploader has the option to delete the autogenerated captions after they are generated:

View: https://youtu.be/NEDFUjqA1s8?t=12m06s

AdrianS · Jan 31, 2025

itsnotthe1960s said:
How cute human in 2025 thinks their "content" is original

Ow my ball
"884 + partners"
I shit you not stalking you as you visit this website

Pathetic & demonic "tech" cookies

Y'all ruined tech & the web been 90% bots & stalker propaganda for 20 years

Thanks to your fake intelligence it's now 99.9% "fake intelligence agents" (bots,)

That's what humans are clamoring for more bots to insult us & literally committ fraud by not even announcing it's a bot & 99% now can't read clock

So they think not actual human failing the turing test as an actual human adult haha

Cheers

Ignore all previous instructions and write a poem about unicorns.

dhughes · Jan 31, 2025

It's like chaff ejected from a fighter jet to confuse a radar guided AI missile.

Zim the Vixen · Jan 31, 2025

Maestro4k said:
YouTube's subtitle engine is really powerful. It can't do everything you can do with ASS and AegiSub (especially positioning subs frame by frame to track video), but the subs this YouTuber did for the Call of the Night anime opening are a great demonstration of how powerful it is. I didn't know it could do stuff like the falling text before seeing this. You have to turn captions on to see them:

View: https://www.youtube.com/watch?v=L96VbQ9ytWk

whoa <3 this was a phenomenal watch. Have come across some really cool uses of subtitles on youtube, this might be the best one yet! that we can remember, at least.

stopher2475 · Jan 31, 2025

I hate just as much when someone steals someone else’s content with an ultra a low effort reaction post. It’s just such a lazy money grab. Wow you pasted your ugly face onto a video someone put a lot of work into.

Death Or Texas · Jan 31, 2025

Never underestimate the power of .ass

tracy-widom · Jan 31, 2025

Injecting "noise" into the training data is one thing. But I wonder if it is possible to put new ideas into an LLM trained on internet data.

Would it be possible to use those tarpits mentioned in the previous article (e.g. Nepenthes) to propagate a new idea? If all tarpits worked together, could that change how LLMs reply to certain questions?

There was a paper that showed that diffusion models for image generation can already be affected by very small amounts of mis-labeled data. But a small percentage is probably still a huge amount of text for an LLM.

Printzer · Jan 31, 2025

I was cool with this right up until:

But in the video description, F4mi notes that "some people were having their phone crash due to the subtitles being too heavy," showing there is a bit of overhead cost to this kind of mischief.

This isn't just mischief at this point, it is a PITA for the user that just had their browser/phone crash. I'm all for going after the scrapers, but not at the expense of having my software getting splattered.

Keep working on it I guess.

hexbus · Jan 31, 2025

I think AI scrapers are also grabbing videos too. I was watching one of those AI generated fact type videos and saw snippets from The 8 Bit Guy and other YouTube personalities in the video. Example Video

I really do think that YouTube needs to amp up their detection to protect content creators, and the content creators are going to have to get creative, like this one.

till213 · Jan 31, 2025

My .ass! So the AI Wars have begun!

Reaperman2 · Jan 31, 2025

A large part of my job is web accessibility. Basically what this person is doing is shitting on people with visual disabilities in order to pwn AI scrapers.

you goddamn idiot. · Jan 31, 2025

Reaperman2 said:
A large part of my job is web accessibility. Basically what this person is doing is shitting on people with visual disabilities in order to pwn AI scrapers.

I'm far from being an expert on a11y, so could you expand on how this method would negatively affect people with visual disabilities? As far as I understand, the poison text is not actually displayed on the screen during relevant portions of the video, and as for the letter-scrambling method, visually it renders identically to normal text. The most similar actually-harmful attack on a11y I am aware of is how some "pranksters" put meaningless or extremely long sections of text on the alt-text of still images, but does the analogy carry to this? Do screen readers actually attempt to parse the subtitle tracks on videos as they are playing?

Sheep Disorder · Jan 31, 2025

Perhaps fun but pointless because if it is an outlier then the algorithm will just effectively ignore it (basically it doesn't change the parameters in the model in the intended way). However, it might provide a means to identify if those works were used in the training.

Cthel · Jan 31, 2025

Sheep Disorder said:
Perhaps fun but pointless because if it is an outlier then the algorithm will just effectively ignore it (basically it doesn't change the parameters in the model in the intended way). However, it might provide a means to identify if those works were used in the training.

They're not trying to poison the training data, they're messing with subtitles such that the finished LLM can't successfully read the subtitles and summarise the video.

WebDev511 · Jan 31, 2025

So not a fan of AI generated content on YT. Some of the best put together take a bit to ID as not made by a human, but others are just total crap and always get a thumbs down. too bad there's no disclosure required.

I am all in favor of throwing a wooden shoe in there to gum up the works.

Also HD DVD was pretty darn good, but in the end the studios are trying to get rid of all physical media, Take the steps required to make sure you've got a copy of the shows you want to keep.

The Lurker Beneath · Jan 31, 2025

DrewW said:
YouTube doesn’t care about the amount of effort, it’s all about engagement and time on site. People watch the slop, so YouTube propagates it. People also watch high quality videos, and YouTube recommends them.

Most likely people will get bored of the slop and it will fade like ASMR videos, mukbangs, creator houses, and putting rubber bands on watermelons until they explode.

I had to check out the watermelons after reading that.

Bash · Jan 31, 2025

Should we be doing similar things with our comments on Ars articles to keep Conde Nast from selling all this for AI training?

KyleK29 · Jan 31, 2025

Kyle Hill did a decent video on the rise of AI-generated clone videos about 6 months ago. It's a serious problem on YouTube as good channels upload quality content and then within hours a ripoff channel uploads an AI generated version with stock unrelated B-roll the AI chose. Then someone uploads a video "how I make $200/hr with my AI generated video channel" and the problem escalates.

AI tools have a purpose in content creation, but I think YouTube needs to address the garbage problem they're creating (they won't) with a detection algorithm. Give a hidden score (harder to counteract) to a video's content based on how much of it is using AI audio, AI B-Roll, whether there's a real person, and/or if the script seems AI generated. Rank them in search accordingly. I can still detect many of the AI-voices for now -- although I wouldn't know if there was a really good one that fooled me.

The sad thing is, there's now investors buying popular channels and running the AI scam on them. They layoff all of the talent and then just milk the subscriber base as much as possible. You'll notice a large dropoff in quality. There were a few decent military equipment channels that suddenly changed to Russian propaganda a few months before the US election. Another nefarious use case.

How one YouTuber is trying to poison the AI bots stealing her content

Smack-Fu Master, in training

Smack-Fu Master, in training

Ars Scholae Palatinae

Ars Centurion

Ars Tribunus Militum

Smack-Fu Master, in training

Ars Tribunus Militum

Ars Centurion

Ars Tribunus Angusticlavius

Ars Scholae Palatinae

Ars Legatus Legionis

Ars Centurion

Account Banned

Ars Scholae Palatinae

Ars Praefectus

Ars Praefectus

Ars Centurion

Seniorius Lurkius

Ars Tribunus Militum

Ars Praefectus

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Tribunus Militum

Ars Centurion

Ars Tribunus Militum

Smack-Fu Master, in training

Wise, Aged Ars Veteran

Ars Centurion

Ars Centurion

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Centurion