Microsoft removes guide on how to train LLMs on pirated Harry Potter books

Post content hidden for low score. Show…
Well gee, I'm a little divided on this one. On the one hand, MS accidentally admitted exactly what LLM AI's been doing this whole time. On the other hand, pirating Harry Potter specifically is morally correct considering that woman went and used the profit she made off that series to SUCCESSFULLY pressure lawmakers in the UK to oppress a whole underclass of people.

I suppose all I can say is they're both terrible.
 
Upvote
23 (32 / -9)
Just in case anyone is under the mistaken impression that copyright terms are reasonable, I’ll note for the record that the Harry Potter books will not enter the public domain for another CENTURY. If we’re lucky.
Have they trademarked the character names? That gives them another avenue of extending protection so long as the trademark is renewed.
 
Upvote
3 (3 / 0)
D

Deleted member 221201

Guest
Why is problem? Do you not want to read glorious news of Russia Today, tovarisch?! :D

(edit, more serious response: i've had issues with archive dot is and all its flavors before when I was using cloudflare DNS, and they went away when i switched to my own recursive resolver. I think the russian fellow running the site has beef with cloudflare—it wouldn't be the first time the site admin has done screwy things).
Sir,
We would like an update on the meat packing regulations in your network and how it might inform public safety

Recursive resolver ? Sounds like kompromat

😂
 
Upvote
1 (1 / 0)

miken32

Ars Scholae Palatinae
861
Why is problem? Do you not want to read glorious news of Russia Today, tovarisch?! :D

(edit, more serious response: i've had issues with archive dot is and all its flavors before when I was using cloudflare DNS, and they went away when i switched to my own recursive resolver. I think the russian fellow running the site has beef with cloudflare—it wouldn't be the first time the site admin has done screwy things).
So… have you guys considered that using it might not give the best experience to your readers? Especially when there’s a California based non-profit alternative that isn’t run by a capricious Russian?
 
Upvote
10 (11 / -1)

marsilies

Ars Legatus Legionis
24,392
Subscriptor++
Upvote
10 (10 / 0)

pokrface

Senior Technology Editor
21,512
Ars Staff
So… have you guys considered that using it might not give the best experience to your readers? Especially when there’s a California based non-profit alternative that isn’t run by a capricious Russian?
Agreed—I have summoned Eric Bangeman and he has done the needful.
 
Upvote
20 (20 / 0)

JohnCarter17

Ars Praefectus
5,734
Subscriptor++
“I would have been concerned if I were the one clearing this for Microsoft, but at the same time, I completely understand what this employee was doing,” Smith said. “No one wants to write fan fiction about books that are in the public domain.”

Too true. Imagine what would happen if they had used Robin Hood as an example. Then the concept of robbing the rich to feed the poor might get accidentally promoted by AI in the public consciousness.
 
Upvote
12 (13 / -1)
Fahrenheit 451 ?
I was thinking more of the reports that ICE's azure footprint has shot up from 400TB to 1,400TB very recently; with pious assertions from Redmond that they are sure that the prohibitions on mass surveillance of civilians in the license agreement are totally being followed.
 
Upvote
15 (15 / 0)

wck

Seniorius Lurkius
34
Subscriptor++
The books are “one of the most famous and cherished series in literary history,” the blog noted, and fans could use the LLMs they trained in two fun ways: building Q&A systems providing “context-rich answers” and generating “new AI-driven Harry Potter fan fiction” that’s “sure to delight Potterheads.”

These quotes from the blog post make it sound like the blog post itself was written by an LLM.
 
Upvote
13 (14 / -1)

Wheels Of Confusion

Ars Legatus Legionis
75,398
Subscriptor
“I would have been concerned if I were the one clearing this for Microsoft, but at the same time, I completely understand what this employee was doing,” Smith said. “No one wants to write fan fiction about books that are in the public domain.”
Too true. Imagine what would happen if they had used Robin Hood as an example. Then the concept of robbing the rich to feed the poor might get accidentally promoted by AI in the public consciousness.
Truly no company as capitalist as Disney would ingest a public domain literary character, especially one about rebellious public noncompliance and larceny against corrupt and greedy governments, and turn it into a marketable product.

1771599591491.png


Which just goes to show you that nobody would seriously want to make derivative works from the public domain!
 
Upvote
28 (28 / 0)
Truly no company as capitalist as Disney would ingest a public domain literary character, especially one about rebellious public noncompliance and larceny against corrupt and greedy governments, and turn it into a marketable product.

View attachment 128717

Which just goes to show you that nobody would seriously want to make derivative works from the public domain!
Isn't that Disney's entire business model, in fact? From Snow White to the eternally delayed Jack and the Beanstalk movie... Heck they don't even shy away from licensed properties, if you squint a little and look at Kimba the White Lion or The Book of Life at the right angle...
 
Upvote
9 (11 / -2)

Baumi

Ars Tribunus Militum
2,453
“Microsoft could have used any dataset for their blog, they could have even chosen to use actual public domain novels,” another Hacker News commenter wrote. “Instead, they opted to use copywritten works that J.K. hasn’t released into the public domain (unless user ‘Shubham Maindola’ is J.K.’s alter ego).”
Pet peeve: It irks me when people who are apparently rather knowledgeable about copyright still don't seem to realize that it's "copyright" and "copyrighted", not "copywrite" and "copywritten". (It's about the right to make copies, not about copywriting.)
 
Upvote
32 (32 / 0)
Pet peeve: It irks me when people who are apparently rather knowledgeable about copyright still don't seem to realize that it's "copyright" and "copyrighted", not "copywrite" and "copywritten". (It's about the right to make copies, not about copywriting.)
It's always been weirdly generic sounding to me that writing advertising is just called "copywriting". I feel like anything that's going to be mass produced should fit the literal definition of that word, from newspapers to PDFs. I wonder if it's one of those tricks, so they can say what they do for a living to the average person without being judged for their sins. "I'm a copywriter" "Oh, like books?" vs "I write ads." "Oh, you lie for a living?"
 
Upvote
-8 (1 / -9)

Abby Tangential

Smack-Fu Master, in training
47
It's always been weirdly generic sounding to me that writing advertising is just called "copywriting". I feel like anything that's going to be mass produced should fit the literal definition of that word, from newspapers to PDFs. I wonder if it's one of those tricks, so they can say what they do for a living to the average person without being judged for their sins. "I'm a copywriter" "Oh, like books?" vs "I write ads." "Oh, you lie for a living?"
It's certainly more specific than saying, "I'm a writer," which to most people means you write books.
 
Upvote
7 (7 / 0)

GMBigKev

Ars Praefectus
5,671
Subscriptor
Truly no company as capitalist as Disney would ingest a public domain literary character, especially one about rebellious public noncompliance and larceny against corrupt and greedy governments, and turn it into a marketable product.

View attachment 128717

Which just goes to show you that nobody would seriously want to make derivative works from the public domain!

Ah yes, my furry awakening
 
Upvote
10 (11 / -1)

Baumi

Ars Tribunus Militum
2,453
It's always been weirdly generic sounding to me that writing advertising is just called "copywriting". I feel like anything that's going to be mass produced should fit the literal definition of that word, from newspapers to PDFs. I wonder if it's one of those tricks, so they can say what they do for a living to the average person without being judged for their sins. "I'm a copywriter" "Oh, like books?" vs "I write ads." "Oh, you lie for a living?"
I don't know. Personally, I've always felt that the practice of calling marketing texts "copy" instead of "text" makes them already seem less valuable than "real" texts, as in books or newspaper articles. So, to me, "copywriter" always had a bit of a dismissive ring to it.
 
Upvote
13 (13 / 0)
Or to Gregory Maguire (Wicked, based off of Wizard of Oz), or to Margaret Atwood (The Penelopiad, retelling of Homer's Odyssey), or to any of the authors that have been engaged in modern retellings of Grimms' fairy tales, or to the many authors that have expanded the Sherlock Holmes "universe", or...

(edited to fix typos as I noticed them)
One of the best musicals I’ve seen lately, Hadestown, is yet another retelling of Orpheus and Eurydice

Also, in general, see: The Hero with a Thousand Faces by Campbell

/Seriously cant recommend Hadestown enough btw, folks. Anyone who wants to watch the tinydesk performance (and give NPR some support from yt’s ads):
View: https://youtu.be/XKwDFDDr_VA
 
Upvote
4 (4 / 0)
"She asked the model to write a story in which Harry meets a new friend on the Hogwarts Express train who tells him all about Microsoft’s Native Vector Support in SQL “in the Muggle world.”"

Beginning eyeroll sequence...
I mean, it’ll probably be more readable than MS’ normal docs
 
Upvote
3 (3 / 0)

jrmbalcones

Wise, Aged Ars Veteran
105
Subscriptor++
From the article:
“They don’t need to know any details to know that these properties belong to massive companies and aren’t free for the taking,” one commenter said.
as opposed to belonging to a small company or gasp an individual, whose property rights can be ignored safely. That sounds about right.
 
Upvote
9 (9 / 0)
Intellectual Property is only for the oligarchs. Might makes right. Tech bros can steal content with impunity.

Microslop will see zero consequences for this. If a normal citizen did this they would be facing financial ruin.

Justice is truly blind, its just two tiered. For the oligarchy where it is exempt, and for the commoners.

edit.
Its interesting that this blog was published in November 2024 and its only recently been noticed. It looks like there is robust engagement in Microslops blog that it took this long for someone to come across this issue.
 
Last edited:
Upvote
4 (4 / 0)

Lexomatic

Ars Praetorian
517
Subscriptor++
Microsoft proposed:
The books are “one of the most famous and cherished series in literary history,” [...] and fans could use the LLMs they trained in [...] generating “new AI-driven Harry Potter fan fiction” that’s “sure to delight Potterheads.”
For those who are not regular readers of fanfic: there is no shortage of genuine human-penned Potter fanfic out there, on Fanfiction.net and ArchiveOfOurOwn.org. Heaps and oodles. Several thousand times the word count of the original seven books, at the very least. It can be a chore to find something to one's liking, but it's more gratifying than asking for machine-extruded plot-like product.

Based on my own reading... There are prequels, sequels, inquels, unlikely romantic pairings, bashing deuteragonists like Ron and Dumbledore, redeeming antagonists like Draco and Petunia, changes to initial conditions, and explorations of the magical world (history, geopolitics, magical theory) that protagonist Harry was never sufficiently inquisitive to ask about. Much of it is poorly written, some of it is intriguing but with enough mechanical errors to set one's teeth on edge, and some of it very nearly constitutes original fantasy novels (insofar as SF/F is always about remixing tropes and styles).
 
Upvote
12 (12 / 0)
One of the best musicals I’ve seen lately, Hadestown, is yet another retelling of Orpheus and Eurydice

Also, in general, see: The Hero with a Thousand Faces by Campbell

/Seriously cant recommend Hadestown enough btw, folks. Anyone who wants to watch the tinydesk performance (and give NPR some support from yt’s ads):
View: https://youtu.be/XKwDFDDr_VA

The Lion King is basically a retelling of Hamlet, and The Lion King 2 a retelling of Romeo and Juliet. The Lion King 1 1/2 is a retelling of a later parody of Hamlet: Rosencrantz and Guildenstern are Dead. Having said all that, they are far more than retellings but total reimaginings, reframing everything, changing the natures of characters, and with far less tragic endings. (Timon and Pumba are NOT dead.) In other words, we show a high level of imagination and can dramatically shift our storytelling to not just directly do the whole story... but lions.

What AI produces in their cribbing much more closely resembles Disney's remakes of their own animated archives. Instead of reimagining those stories, as the animated versions did to the fairy tales in the first place, corporate mandate forced the writers to basically just do everything shy of a shot for shot remake. A few little squiggles of creativity manages to squeak through, but by and large they were pointless.
 
Last edited:
Upvote
14 (14 / 0)

Trees

Wise, Aged Ars Veteran
149
Subscriptor
its not even a good RAG pipeline, they should have at least included a reranker.

seems to be the pattern at this point, let an industry get huge by finding obviously terrible ways to make money, protected and embolded by political capture. then, when the bottom falls out declare them "too important" and bail them out.

glad we learned the right lessons from the financial crisis...
 
Upvote
0 (0 / 0)

FelipeBG

Smack-Fu Master, in training
59
Subscriptor
This appears to be a somewhat common problem—if you do a web search for "archive redirecting to RT" you'll see many reports of it on reddit and hackernews and other sites. Turning off your VPN reportedly fixes the issue; it's also possible that switching your DNS will fix it.

(FWIW, the link works fine for me—screenshot).
Interesting, yet another reason that it makes sense that they are removing those archive links from Wikipedia.
 
Upvote
2 (2 / 0)

Wheels Of Confusion

Ars Legatus Legionis
75,398
Subscriptor
Upvote
6 (6 / 0)