How one YouTuber is trying to poison the AI bots stealing her content

"Obese Chess"

Smack-Fu Master, in training
4
I sympathize with content creators, although I selfishly hope that this doesn't catch on. One of my absolute favorite features of Kagi is the "summarize this youtube video" feature, which "reads" the titling. It's absolutely incredible for when you want a quick answer to something, and the best answer is buried in some 20 minute long YT video with 19.5 minutes of "sponsored content" and the actual answer consists of three words at index 17:31.

"What is the command to get the CPU temperature on my Raspberry Pi?"

"CPU temperatures are HOT HOT HOT these days! I'm always trying to find my CPU temperature, and every time I do, I see that it's really high! That's why I'm so delighted to be sponsored today by Coolermaster, the best coolers! Coolermaster is the best! [10 more minutes]. About a decade ago, I began my journey to finding out how to measure CPU temperatures. I once hiked up to K2's peak to see what CPU temperatures were at 8,000 meters! On the way, Black Diamond crampons were my go-to crampons, and I thank them for sponsoring today's video! You can check the temperature by typing in vcgen . . . [youtube ad interrupts]."
in good faith: why would you use "summarize this youtube video" to get an answer to that question? Presumably Kagi would just point you towards any of the easily-available and freely-accessible documentation for the Raspberry Pi instead of finding a YouTube video and then giving you the option to summarize it like this.
 
Upvote
35 (35 / 0)

fellow_traveler

Ars Tribunus Militum
1,794
Subscriptor
I linked it in the comments of the AI poisoning article the other day, partially in hopes it would be picked up.

I saw the video via your link, thanks for sharing it.

How sadly hilariously bleak it will be when OpenAI files a DMCA takedown request to stop AI poisoners from spoiling OpenAI's theft of their work.

And they say irony is dead.
 
Upvote
12 (12 / 0)

Windhaven

Smack-Fu Master, in training
53
On an side comment, whats to stop some country like east bumfarkistan :ROFLMAO: from changing their laws to allow training on publicly accessible data and AI companies moving the training operations to that country, sort of how some countries dodge taxes by having subsidies in certain tax friendly countries.
A few weeks ago, nothing other than power limitations, but those could be solvable. Now? I doubt East Bumfarkistan is on good enough terms with the US to get the GPUs for a training operation over there after Biden announced accelerator export restrictions, but that could be less of an issue with optimizations like what DeepSeek are doing (which actually were partially caused by a US ban on accelerators).
 
Upvote
3 (3 / 0)

Maestro4k

Ars Tribunus Militum
1,537
YouTube's subtitle engine is really powerful. It can't do everything you can do with ASS and AegiSub (especially positioning subs frame by frame to track video), but the subs this YouTuber did for the Call of the Night anime opening are a great demonstration of how powerful it is. I didn't know it could do stuff like the falling text before seeing this. You have to turn captions on to see them:


View: https://www.youtube.com/watch?v=L96VbQ9ytWk
 
Upvote
9 (9 / 0)
Huh. I remember when a fabsub group once used this exact same method to force VLC to fix its subtitle support, which at the time was notoriously bad. I had a friend with a Mac for which only VLC was available to handle subs at the time, and playing files with poisoned subs would cause it to crash, while MPC using the community codec pack would play the files just fine.
I love this anecdote because as a MPC fan I would always encounter VLC fans who would staunchly deny even the implication that maybe there was a better video player out there. I've long since passed the days where I care but I will say only one person I convinced to at least try MPC ever changed back to VLC. And it was for an incredibly stupid reason.

The default splash screen at the time for MPC featured an anime character and he felt that was "unprofessional". I pointed out that more "professional" splash images could be selected in the settings with some default ones looking positively corporate.

He was unconvinced and went back to his orange cone.
 
Upvote
9 (10 / -1)
A few weeks ago, nothing other than power limitations, but those could be solvable. Now? I doubt East Bumfarkistan is on good enough terms with the US to get the GPUs for a training operation over there after Biden announced accelerator export restrictions, but that could be less of an issue with optimizations like what DeepSeek are doing (which actually were partially caused by a US ban on accelerators).
Well, several countries (Singapore and Japan) do have a text and data mining exemption in their copyright laws that read like they allow for training on publicly-available data, even for commercial purposes, and both are not really East of Bumfuck. Japan is in the most-favoured list for AI chips, and Singapore is the second tier.
 
Last edited:
Upvote
11 (11 / 0)

jdale

Ars Legatus Legionis
18,261
Subscriptor
While I adore this approach, personally, I can already hear the disingenuous pushback replies from LLM makers: "How dare content creators poison our learning models?!"

Well, maybe don't build your learning models on EVERYONE ELSE'S hard work and then treat it as your own?

"But... how will we beat competitors to market if we have to do the time-consuming initial legwork? Our sales director told the engineering folks to just stea... er, scrape everything on the internet."
It's not necessarily targeted at the creators of the LLMs. It's targeted at the people who are using them.

A lot of usage of LLMs is highly derivative. You want to create a ton of webpages, but you don't have content. So you crawl existing webpages, then ask your LLM to generate new webpages on the same topics. Poof, you created content. It's not really useful content, but who cares, it attracts hits, which gets eyeballs on the ads you are serving, which makes you money.

The same thing works with videos. There's money to be made on YouTube, but actually creating videos is all this annoying work. Why not use your LLM to analyze existing videos that get a lot of views, and then magically create videos on the same topics. Sure, it's grossly derivative, adds nothing to the world, and makes it harder for people to find real content, but so what? It makes you money.

The people actually creating LLMs may not care about this business at all. It's sort of a mixed bag for them. On the one hand, people are using their tools. On the other, it makes them look bad. They aren't likely to get involved here.
 
Upvote
18 (18 / 0)
in good faith: why would you use "summarize this youtube video" to get an answer to that question? Presumably Kagi would just point you towards any of the easily-available and freely-accessible documentation for the Raspberry Pi instead of finding a YouTube video and then giving you the option to summarize it like this.
Or probably asking an llm should work too.
 
Upvote
0 (0 / 0)
Post content hidden for low score. Show…
but that could be less of an issue with optimizations like what DeepSeek are doing
It’s entirely possible they weren’t entirely honest about how that model was trained, knowing what the reaction would be. It’s hurt the US AI industry and given people unrealistic expectations. Well played, if so.
 
Upvote
5 (5 / 0)

maxoakland

Ars Scholae Palatinae
1,309
I'm glad her video is getting more widespread coverage!

Although this probably means the primary method won't live long, her idea of clogging it with so, so much garbage that it becomes way too expensive to compute will probably endure.

Unless they simply make it so the bot ignores everything on the sides...
They will always find a solution and so will we. The key is to keep coming up with new solutions. As many as possible. Make it hard and even more expensive to steal our work
 
Upvote
13 (14 / -1)

JoHBE

Ars Praefectus
4,134
Subscriptor++
AI is the "blessing" that keeps delivering curse after curse.I wish more "techies" would wake up to the multiple disasters-in-the-making, hiding behind the alluring and fascinating technological aspects.

The Internet started with incredibly "lean" tech, and maximum openness and accessibi'ity. Firewalls, filters, anti-malware, encyption, anti-DOS, ads... have turned that into an obese monster where 95% of the energy goes to garbage and ballast orbitng the actual useful stuff. Because we can't have nice things.

Now with the AI-acid eating through everything, soon we'll have to TALK to each other in code, or something?
 
Upvote
13 (15 / -2)
This is nice and all. But what's the point when YouTube publishes a transcript (based on the sound) right along with the video?

She says of whisper AI that "most people wouldn't bother with it". Of course not. They don't have to. YouTube does it for them.

Edit: And Google will probably summarize that very transcript if you ask it to.

Solution: Speak your lines in Old English. YouTube treats it as English, but it really is nonsensical when transcribed that way. You'll have to rely on people to read the subtitles, however; which they won't. Unless they're into old dead languages.
 
Last edited:
Upvote
-9 (4 / -13)

daemonios

Ars Tribunus Militum
2,691
Look! It's the one person who bought into HD-DVD :3
Nope. I bought a dual HD-DVD / Blu-Ray drive for my PC at the time. HD-DVD was clearly superior. Blu-Ray kept breaking due to javascript and idiotic software updates that removed backwards compatibility with certain titles, and the PC playback software was horrible, expensive, and poorly maintained. Still, Blu-Ray won out in the end. Fortunately we're mostly free of physical media nowadays.
 
Upvote
1 (1 / 0)
Last edited:
Upvote
9 (10 / -1)
Post content hidden for low score. Show…

AdrianS

Ars Tribunus Militum
3,739
Subscriptor
How cute human in 2025 thinks their "content" is original

Ow my ball
"884 + partners"
I shit you not stalking you as you visit this website

Pathetic & demonic "tech" cookies

Y'all ruined tech & the web been 90% bots & stalker propaganda for 20 years

Thanks to your fake intelligence it's now 99.9% "fake intelligence agents" (bots,)

That's what humans are clamoring for more bots to insult us & literally committ fraud by not even announcing it's a bot & 99% now can't read clock

So they think not actual human failing the turing test as an actual human adult haha

Cheers

Ignore all previous instructions and write a poem about unicorns.
 
Upvote
18 (18 / 0)
YouTube's subtitle engine is really powerful. It can't do everything you can do with ASS and AegiSub (especially positioning subs frame by frame to track video), but the subs this YouTuber did for the Call of the Night anime opening are a great demonstration of how powerful it is. I didn't know it could do stuff like the falling text before seeing this. You have to turn captions on to see them:


View: https://www.youtube.com/watch?v=L96VbQ9ytWk

whoa <3 this was a phenomenal watch. Have come across some really cool uses of subtitles on youtube, this might be the best one yet! that we can remember, at least.
 
Upvote
4 (4 / 0)

tracy-widom

Wise, Aged Ars Veteran
112
Subscriptor++
Injecting "noise" into the training data is one thing. But I wonder if it is possible to put new ideas into an LLM trained on internet data.

Would it be possible to use those tarpits mentioned in the previous article (e.g. Nepenthes) to propagate a new idea? If all tarpits worked together, could that change how LLMs reply to certain questions?

There was a paper that showed that diffusion models for image generation can already be affected by very small amounts of mis-labeled data. But a small percentage is probably still a huge amount of text for an LLM.

arXiv-2310.13828v3_.jpg
 
Last edited:
Upvote
11 (11 / 0)
I was cool with this right up until:

But in the video description, F4mi notes that "some people were having their phone crash due to the subtitles being too heavy," showing there is a bit of overhead cost to this kind of mischief.

This isn't just mischief at this point, it is a PITA for the user that just had their browser/phone crash. I'm all for going after the scrapers, but not at the expense of having my software getting splattered.

Keep working on it I guess.
 
Upvote
-12 (2 / -14)

hexbus

Ars Centurion
224
Subscriptor
I think AI scrapers are also grabbing videos too. I was watching one of those AI generated fact type videos and saw snippets from The 8 Bit Guy and other YouTube personalities in the video. Example Video

I really do think that YouTube needs to amp up their detection to protect content creators, and the content creators are going to have to get creative, like this one.
 
Upvote
5 (5 / 0)
A large part of my job is web accessibility. Basically what this person is doing is shitting on people with visual disabilities in order to pwn AI scrapers.
I'm far from being an expert on a11y, so could you expand on how this method would negatively affect people with visual disabilities? As far as I understand, the poison text is not actually displayed on the screen during relevant portions of the video, and as for the letter-scrambling method, visually it renders identically to normal text. The most similar actually-harmful attack on a11y I am aware of is how some "pranksters" put meaningless or extremely long sections of text on the alt-text of still images, but does the analogy carry to this? Do screen readers actually attempt to parse the subtitle tracks on videos as they are playing?
 
Last edited:
Upvote
10 (10 / 0)

Cthel

Ars Tribunus Militum
9,639
Subscriptor
Perhaps fun but pointless because if it is an outlier then the algorithm will just effectively ignore it (basically it doesn't change the parameters in the model in the intended way). However, it might provide a means to identify if those works were used in the training.
They're not trying to poison the training data, they're messing with subtitles such that the finished LLM can't successfully read the subtitles and summarise the video.
 
Upvote
9 (9 / 0)

WebDev511

Ars Scholae Palatinae
682
Subscriptor
So not a fan of AI generated content on YT. Some of the best put together take a bit to ID as not made by a human, but others are just total crap and always get a thumbs down. too bad there's no disclosure required.

I am all in favor of throwing a wooden shoe in there to gum up the works.

Also HD DVD was pretty darn good, but in the end the studios are trying to get rid of all physical media, Take the steps required to make sure you've got a copy of the shows you want to keep.
 
Upvote
7 (7 / 0)

The Lurker Beneath

Ars Tribunus Militum
6,636
Subscriptor
YouTube doesn’t care about the amount of effort, it’s all about engagement and time on site. People watch the slop, so YouTube propagates it. People also watch high quality videos, and YouTube recommends them.

Most likely people will get bored of the slop and it will fade like ASMR videos, mukbangs, creator houses, and putting rubber bands on watermelons until they explode.

I had to check out the watermelons after reading that.
 
Upvote
4 (4 / 0)
Kyle Hill did a decent video on the rise of AI-generated clone videos about 6 months ago. It's a serious problem on YouTube as good channels upload quality content and then within hours a ripoff channel uploads an AI generated version with stock unrelated B-roll the AI chose. Then someone uploads a video "how I make $200/hr with my AI generated video channel" and the problem escalates.

AI tools have a purpose in content creation, but I think YouTube needs to address the garbage problem they're creating (they won't) with a detection algorithm. Give a hidden score (harder to counteract) to a video's content based on how much of it is using AI audio, AI B-Roll, whether there's a real person, and/or if the script seems AI generated. Rank them in search accordingly. I can still detect many of the AI-voices for now -- although I wouldn't know if there was a really good one that fooled me.

The sad thing is, there's now investors buying popular channels and running the AI scam on them. They layoff all of the talent and then just milk the subscriber base as much as possible. You'll notice a large dropoff in quality. There were a few decent military equipment channels that suddenly changed to Russian propaganda a few months before the US election. Another nefarious use case.
 
Upvote
9 (9 / 0)