AI built from 1800s texts surprises creator by mentioning real 1834 London protests

plectrum

Ars Scholae Palatinae
673
Subscriptor
I'm not sure that this is surprising, he prompted a chatbot for a statement about 1832, and it produced what was probably a major event for that year, along with a big-name politician (Palmerston would later become Prime Minister, I believe?) who was involved and likely to be mentioned in the most sources, especially later sources.
I'm trying to work out what the event referred might have been. The Reform Act passed 1832, and various folks were unhappy that there remained a requirement to own property to vote. This would later become the Chartist movement of the late 1830s-1840s, but that's too late for our tale. The Tolpuddle Martyrs were sentenced to transportation to Australia for trying to form a labour union in 1834, which caused mass demonstrations in their support. But Palmerston was foreign secretary at the time and all his speeches in Hansard in that year relate to foreign matters.

So, unless someone has a better source, even the link between '1834', 'Palmerston' and 'protest' seems tenuous.

(Also I'm no expert but the article image looks circa 1840-50s to me, not 'late 19th century'. Palmerston was dead by 1865 anyway)
 
Upvote
32 (33 / -1)
So some comp sci student who evidently doesn't know much about history (or even how to learn about history) was surprised that the LLM he trained actually output true information from the training set? Wow. This really makes me rethink the reasons I subscribe to Ars.
In a hundred years’ time, when the tech-priests finally pluck up the courage to ask Condensotron The Ultimate Epitomiser to explain the entire LLM revolution in its concisest possible form, this is going to be the output.
 
Upvote
11 (11 / 0)
Post content hidden for low score. Show…
There are some Victorian-sounding bits in there, but mostly the text is gobbledygook. It switches halfway down from past tense to future tense. Parts are nonsensical and ungrammatical ("was not bound in the way of private", "who first settled in the Gospel at Jerusalem", "a record of the prosperity and prosperity"), not to mention bad punctuation ("re counted", "be'known"). Other than the mention of Lord Palmerston, there is no indication of what happened in that year. It's word salad. The problem is that the remoteness of that era tempts one to excuse the nonsense as archaic. But no, "the day of law" is not Victorian English.
 
Upvote
73 (73 / 0)

argt

Smack-Fu Master, in training
69
This is a bit misleading. v0 and v0.5 were trained from scratch. But the one that provided the best answer is using phi-1.5, which is already pre-trained by Microsoft and has 1.3b parameters. The student probably fine-tuned phi-1.5 with 700m parameters of specific Victorian-era texts.

It's still an interesting application, but the improvement is to be expected.
 
Last edited:
Upvote
-9 (3 / -12)

darkowl

Ars Tribunus Militum
1,995
Subscriptor++
I think this would be noteworthy if it had taken disparate pieces of information and drawn conclusions of things that had occurred but perhaps were not fully known by researchers before. That is, something novel.

But... "I put some text into an LLM and it output something that I didn't know, so I looked it up and it was real!" seems a bit of a lacklustre outcome. I'm sure it'll hallucinate events from the 1800s just as effectively making all this a moot point.
 
Upvote
31 (31 / 0)
I'm trying to work out what the event referred might have been. The Reform Act passed 1832, and various folks were unhappy that there remained a requirement to own property to vote. This would later become the Chartist movement of the late 1830s-1840s, but that's too late for our tale. The Tolpuddle Martyrs were sentenced to transportation to Australia for trying to form a labour union in 1834, which caused mass demonstrations in their support. But Palmerston was foreign secretary at the time and all his speeches in Hansard in that year relate to foreign matters.

So, unless someone has a better source, even the link between '1834', 'Palmerston' and 'protest' seems tenuous.

(Also I'm no expert but the article image looks circa 1840-50s to me, not 'late 19th century'. Palmerston was dead by 1865 anyway)
Its the poor law reform act of 1834
https://www.nationalarchives.gov.uk...ian-poor/protesting-against-the-new-poor-law/
https://www.nationalarchives.gov.uk/education/resources/1834-poor-law/
 
Upvote
12 (12 / 0)

maverick

Ars Tribunus Militum
1,681
Subscriptor
"But what makes this episode especially interesting is that a small hobbyist model trained by one man appears to have surprised him by reconstructing a coherent historical moment..."

In other words: Person ignorant of history surprised to learn some actual history from an LLM.
If he was surprised by this, it sounds like he's pretty ignorant about LLMs, too...
 
Upvote
14 (16 / -2)
The thing that drives me up the wall here isn't the "golly gee whiz an LLM trained on newspapers from 1834 managed to average out two true details from 1834," though that is annoying, it's the credulous repetition of the student saying "oh wow it got so much better with more data just imagine how good it'll be with even more! Maybe it'll even think like someone from the 1800s!" as if we haven't just spent, collectively, tens of billions of dollars, thousands of tons of carbon and entire reservoirs of freshwater figuring out that no it absolutely does not work that way.
Eh I'd have to disagree with your conclusion somewhat, more data and more compute to train on all that data faster absolutely does result in generally better outputs, UP TO A POINT.
That's the entire reason why so many billions have been poured into acquiring more of both those components, the line did go up pretty sharply for the correlation between data in and better results out, the problem is that rather than line continuing to go up it was just the first half of the bell curve.
Now we seem to have raced over the top and are merrily rolling down the far side of the curve somewhere, while everyone that poured billions into the belief that any crest was so far away as to not be worth considering is still clinging to that belief, even as they burn more and more of everything for ever smaller gains.
 
Upvote
9 (10 / -1)



I think this is why reporting on AI is so hard. Ars is a much better source than most (it's one of my few sources of news in reality), but yet this kind of a sentence makes it through. There's no thought processes of past eras to actually be interrogated here.
I’m not disagreeing, but I’m curious if you can explain why you are saying this so confidently. It seems like maybe how we put sentences together does reflect thought patterns. But maybe you know that’s not true? Can you say why?
 
Upvote
-9 (4 / -13)

fiyz

Smack-Fu Master, in training
30
Wait, so this event existed in the training data in parts. The guy asked a question, and the LLM responded with the statistical connections? I'm trying to find out what is the story here. This is... how LLMs work?
We have to remember, our current iteration of AI is founded on big data, whose earliest problems were querying the data. Nothing new here? I didn't see much about how he trained his model, so it's quite possible he just rented GPU time... Which also isn't big news.

I think the story is that he's a human using AI constructively, as opposed to the humans using it for smut.
 
Upvote
-7 (5 / -12)

Tanngrisnir

Smack-Fu Master, in training
13
The line about using a different tokenizer got to me. What on earth is the upshot to using a custom tokenizer for your Victorian Text? Especially if you're using the GPT-2 architecture as the article seemed to imply - GPT-2 uses Byte Pair Encoding if my memory serves and I don't think there's any argument that there's any "modern" information to be leaked into your model that way?
 
Upvote
0 (1 / -1)

Tanngrisnir

Smack-Fu Master, in training
13
I’m not disagreeing, but I’m curious if you can explain why you are saying this so confidently. It seems like maybe how we put sentences together does reflect thought patterns. But maybe you know that’s not true? Can you say why?
Obviously I'm not the OP but I think it could be better phrased as no new thought patterns. Sure, you can get some information about how historical figures thought from their writing, but you could also get that information from just reading it?
 
Upvote
-6 (0 / -6)

Baumi

Ars Tribunus Militum
2,453
Knowing that a model can trusted to dig out facts after being fed copious amounts of text makes it much easier to analyze said texts, especially if it can cite the excerpts it used to come up with a particular inference.
The hook of the story is: “Guy trains LLM that then gives a factually correct response to a query.” That’s not really news. Nobody seriously claims that LLMs are wrong all the time.

It’s why people use LLMs: because a lot of times the stochastic parrot produces results that contain facts from the data it was trained with.

The problem is that they are wrong far too frequently to be useful in many of the kind of mission critical functions that people are using them for.
 
Upvote
12 (14 / -2)

umichans

Smack-Fu Master, in training
57
Yuck, why would you train the LLM on something so disgusting, soon it’ll be whispering sweet nothings of ethnic cleansings, awesome genocides and yummy Corn Laws.

The LLM will start referring to itself as Churchillbot. Whispering about those dirty Irish, disgusting Indians and awful Mau-Maus (Kenyans).
 
Upvote
-17 (2 / -19)

RZetopan

Ars Tribunus Angusticlavius
7,574
Granted the story appears peripheral, and perhaps it is - but heck, who knows where LLM will go?
Based on all the relevant evidence, it will end up here:
1755964536970.gif
 
Upvote
5 (5 / 0)

Tanngrisnir

Smack-Fu Master, in training
13
I wonder what an AI trained in the entirely of Project Gutenberg and nothing else would be like?
Not good. There's not nearly enough text there for anything more than a generic text generator. It would probably appear coherent without saying anything. Probably the same level as my 4 year old 🤣
 
Upvote
1 (2 / -1)
I think this is a fun use case. Like the other comments it feels like none of what was produced should be that surprising, his model just reached a large enough size to become coherent.


I think this is why reporting on AI is so hard. Ars is a much better source than most (it's one of my few sources of news in reality), but yet this kind of a sentence makes it through. There's no thought processes of past eras to actually be interrogated here.
Also, the thing about this sentence is it could have just as easily been accomplished by... randomly flipping through the training material and reading it??
 
Upvote
9 (10 / -1)

plectrum

Ars Scholae Palatinae
673
Subscriptor
Is it though? What's the evidence? Or is that just human projection into the gibberish?

The text has three identifiable facts: 1834, a protest and Palmerston. The above is a protest in 1834 but what did Palmerston have to do with it? I'd wager that if you took any two facts and searched you'd find something ('protest 1834', 'protest Palmerston', 'Palmerston 1834'). But to have any kind of historical meaning the text has to align all three facts. Otherwise it's just probability theory and a short enough word salad is going to be something anyone can post-hoc ascribe meaning to which was not actually present in the text.

(See also the people who find numerical patterns in the bible for similar kinds of effects)
 
Upvote
24 (26 / -2)
Fact-checked with Google? Like, dug into actual web-site search results? Or relied on Google's AI search result summary? Is this actually a case of using AI to fact-check AI?

According to the screenshot provided in the readme, AI fact checks AI :
 

Attachments

  • fichier copié.png
    fichier copié.png
    234.9 KB · Views: 17
Upvote
2 (2 / 0)

Chuckstar

Ars Legatus Legionis
37,251
Subscriptor
I'm trying to work out what the event referred might have been. The Reform Act passed 1832, and various folks were unhappy that there remained a requirement to own property to vote. This would later become the Chartist movement of the late 1830s-1840s, but that's too late for our tale. The Tolpuddle Martyrs were sentenced to transportation to Australia for trying to form a labour union in 1834, which caused mass demonstrations in their support. But Palmerston was foreign secretary at the time and all his speeches in Hansard in that year relate to foreign matters.

So, unless someone has a better source, even the link between '1834', 'Palmerston' and 'protest' seems tenuous.

(Also I'm no expert but the article image looks circa 1840-50s to me, not 'late 19th century'. Palmerston was dead by 1865 anyway)
The article explains what protests, although I don’t understand the claimed link to Palmerston. The protests were against the amended Poor Law of 1834.

I would point out that the actual LLM output seems to just mentions it as having been the era in which Lord Palmerston was an important figure, so from that perspective the LLM is correct.
 
Upvote
7 (7 / 0)
Is it though? What's the evidence? Or is that just human projection into the gibberish?

The text has three identifiable facts: 1834, a protest and Palmerston. The above is a protest in 1834 but what did Palmerston have to do with it? I'd wager that if you took any two facts and searched you'd find something ('protest 1834', 'protest Palmerston', 'Palmerston 1834'). But to have any kind of historical meaning the text has to align all three facts. Otherwise it's just probability theory and a short enough word salad is going to be something anyone can post-hoc ascribe meaning to which was not actually present in the text.

(See also the people who find numerical patterns in the bible for similar kinds of effects)
Palmerston was a member of the government that passed the legislation. He was Secretary at War between 1809–1828 and oversaw the defeat of Napoleon and the post war settlement at the congress of Vienna. Palmerston was a senior well known national politician for a quarter of a century in 1834.
 
Upvote
2 (4 / -2)

cuvtixo

Ars Scholae Palatinae
1,026
I think this is a fun use case. Like the other comments it feels like none of what was produced should be that surprising, his model just reached a large enough size to become coherent.


I think this is why reporting on AI is so hard. Ars is a much better source than most (it's one of my few sources of news in reality), but yet this kind of a sentence makes it through. There's no thought processes of past eras to actually be interrogated here.
Whoa, so original written text doesn't reveal thought patterns? If so, are thought patterns are totally inexplicable and indescribable? Isn't your belief that there are no thought processes of past eras to be interrogated, itself not a thought pattern? Are you restricting "thought pattern" to neurochemicals and sparks amongst neurons? You're dismissal is actually a bold claim. Many here want to take a "common sense" critique of this article on AI, but it's possible there is no commonsense from which anyone can take a stand, when it comes to AI. If you don't like the material, just move along.
 
Upvote
-12 (3 / -15)

Redsnertz

Ars Scholae Palatinae
786
"Coherent"
It was the year of our Lord 1834 and the streets of London were filled with protest and petition. The cause, as many re counted, was not bound in the way of private, but having taken up the same day in the day of Lord Palmerston, the public will receive a short statement of the difficulties under which the day of law has reached us. It is a matter of deep regret, that the present events in the history of the world are clear, and consequently will be'known. It is not true that the very men who first settled in the Gospel at Jerusalem should have so extensive and so interesting a record of the prosperity and prosperity

I'm... not sure that word means what you think it does.
 
Upvote
20 (20 / 0)

graylshaped

Ars Legatus Legionis
67,711
Subscriptor++
Palmerston was a member of the government that passed the legislation. He was Secretary at War between 1809–1828 and oversaw the defeat of Napoleon and the post war settlement at the congress of Vienna. Palmerston was a senior well known national politician for a quarter of a century in 1834.
I don’t think anyone is suggesting that information wasn’t in the training database. The suggestion was that because it was in the database, there is a statistical chance that the name may appear when looking to fill in a blank after the phrase "It was the year of our Lord 1834.” It probably isn’t a subplot Eco was coerced into removing from Foucault’s Pendulum because it hit too close to home for the Pentaverate.
 
Upvote
8 (8 / 0)

deverefan

Smack-Fu Master, in training
2
Subscriptor
I think this is a fun use case. Like the other comments it feels like none of what was produced should be that surprising, his model just reached a large enough size to become coherent.


I think this is why reporting on AI is so hard. Ars is a much better source than most (it's one of my few sources of news in reality), but yet this kind of a sentence makes it through. There's no thought processes of past eras to actually be interrogated here.
Agree it is fun. Also, the custom encoding mentioned in the article is going to be a big deal. word2vec would not be a good choice.
More needs to be said but space is not given. For a 16C model, yet another encoding would be needed.
 
Upvote
0 (1 / -1)
The text has three identifiable facts: 1834, a protest and Palmerston.
Not even that. "Protest and petition" is a legal term, something like a written legal appeal. It doesn't mean people taking to the street. In late Victorian English you have "protest meetings" and "meetings of protest" but "protest" alone, meaning a public demonstration, is a 20th century innovation (at least according to examples in the Oxford English Dictionary). So this is just a random phrase stuck in the text.

1834 was provided by the prompt. So the only thing that the LLM provided was Palmerston, among "Gospel", "Jerusalem", "prosperity", and other bits irrelevant to the London unrest or anything else that happened in 1834. That is like saying that your prayers to the Spaghetti Monster are what made your number come up on the roulette wheel, ignoring the times it didn't.
 
Last edited:
Upvote
22 (22 / 0)

NetMage

Ars Tribunus Angusticlavius
9,743
Subscriptor
There's no thought processes of past eras to actually be interrogated here.
I’m not sure that’s true. If you believe that the writings of a particular person reflect the thought processes of that person (and what is the alternative), then you have to believe that analyzing the writings are a step to analyzing their thought processes. So analyzing the collected writings of a group of people that share something in common should be a step toward analyzing their thought processes.
 
Upvote
-6 (2 / -8)