Wikipedia bans Archive.today after site executed DDoS and altered web captures

Post content hidden for low score. Show…
Post content hidden for low score. Show…
The irony here is kind of amusing, if only because not that long ago (okay, well, maybe 26 years is a long time) I remember that Wikipedia was banned from being used as reference source material for college papers because "anyone could say anything and no one checks it", which was possibly true at the time. Oversight of the data was, at best, spotty.

These days, it seems like their reliability and accuracy is much better. But it's also another generation later, so I'd expect they had positive changes. Otherwise they'd be about as common and well known as ICQ is today.
Wikipedia always has had a requirement to cite sources, and at some point added warnings on pages that dont adequately do so. The general guidence in the period you’re talking about was often abbreviated as “dont cite wikipedia” but what that really meant, then and now, was “dont cite it as a primary source, use it to read the info and use it to find primary sources”

I do think the article accuracy is much better, as expected over time, but it’s an encyclopedia, not a primary source. If you want to read about a topic it’s perfect, if you want to write a essay on the topic use its sources, not the wikipedia page itself. It’s a jumping off point for a deeper dive for that kind of thing.

To be clear none of that is an indictment of wikipedia, I genuinely think it’s a wonder of the modern world, probably the single best thing to come out of the internet, and probably the single greatest store of human knowledge in history, this is just about academic guidance on using it

Archive sites are a whole different ballgame, they literally cant cite sources because they are supposed to be the authoritative source if the original is lost, so they need to be trusted absolutely. The internet archive has that level of trust, and a foundational structure to maintain it, archive.is was always kinda a secondary option that never fully reached that level of trust and now has completely destroyed it in the name of some petty douchebaggery.
 
Upvote
400 (404 / -4)

42Kodiak42

Ars Scholae Palatinae
1,355
The irony here is kind of amusing, if only because not that long ago (okay, well, maybe 26 years is a long time) I remember that Wikipedia was banned from being used as reference source material for college papers because "anyone could say anything and no one checks it", which was possibly true at the time. Oversight of the data was, at best, spotty.

These days, it seems like their reliability and accuracy is much better. But it's also another generation later, so I'd expect they had positive changes. Otherwise they'd be about as common and well known as ICQ is today.
It's a tertiary source, so you're still not supposed to use it as a reference for any professional journalistic or academic work. Although, the reason has changed from "anyone can edit it" to normal tertiary source reasons and a unique failure mode of Wikipedia: Citogenesis.
It's still an amazing intellectual resource, but it's not a direct source of information or journalistic analysis of other sources. And it's generally accepted for casual, non-journalistic or academic efforts, but you can't use Wikipedia as a source when making anything that Wikipedia might use as a source.
 
Upvote
147 (148 / -1)

Wtcher

Ars Centurion
261
Subscriptor++
I've been pleased with Wikipedia overall as of late.

Aside from this escalation, I witnessed also an admin step in recently in an unrelated topic where a page was being weaponised by anonymous editors.

They corrected the malicious edit, then locked the page out from anonymous modifications, which seemed to be effective.
 
Upvote
154 (155 / -1)

DNA_Doc

Ars Scholae Palatinae
904
The irony here is kind of amusing, if only because not that long ago (okay, well, maybe 26 years is a long time) I remember that Wikipedia was banned from being used as reference source material for college papers because "anyone could say anything and no one checks it", which was possibly true at the time. Oversight of the data was, at best, spotty.

These days, it seems like their reliability and accuracy is much better. But it's also another generation later, so I'd expect they had positive changes. Otherwise they'd be about as common and well known as ICQ is today.
I still remember the Nature Special Report from 2005 - Wikipedia was almost as good as Encyclopedia Britannica then (at least, in the domains tested), and I'd like to think it's gotten better since.

Internet encyclopaedias go head to head
 
Upvote
104 (105 / -1)
The irony here is kind of amusing, if only because not that long ago (okay, well, maybe 26 years is a long time) I remember that Wikipedia was banned from being used as reference source material for college papers because "anyone could say anything and no one checks it", which was possibly true at the time. Oversight of the data was, at best, spotty.

These days, it seems like their reliability and accuracy is much better. But it's also another generation later, so I'd expect they had positive changes. Otherwise they'd be about as common and well known as ICQ is today.
I don't think that's really changed - wikipedia isn't considered an acceptable reference at the college level. College papers shouldn't be citing to a paper encyclopaedia for that matter.

The issue isn't so much accuracy as that, as a tertiary reference, it is supposed to be a collection of citations, not a source for them.
 
Upvote
162 (162 / 0)

thelee

Ars Tribunus Militum
1,902
Subscriptor
Although, the reason has changed from "anyone can edit it" to normal tertiary source reasons and a unique failure mode of Wikipedia: Citogenesis.

i thought the notion of citogenesis was amusing but hypothetical, but a couple years after i saw that term on xkcd, I was revisiting an article that I had edited but put in something that wasn't properly sourced, to see if it had been better sourced, or if I could add one (I forgot the article, probably a city wikipedia page). To my pleasant surprise, I saw a reference added! I followed it... to an external site that made the assertion, with a reference. Great! I followed the link from there... only to end up on the original wikipedia article again. Whomever had added the reference hadn't paid attention to the fact that the external site was simply regurgitating what the unsourced wikipedia article had said. It was a process exactly as described in xkcd, and it was directly linked to an edit i made, it was literally my fault!

i forget if i deleted it or did some legwork to find a better source, but i definitely didn't let it stand. and now i'm very cognizant of this being a real risk.
 
Last edited:
Upvote
215 (217 / -2)
i thought the notion of citogenesis was amusing but hypothetical, but a couple years after i saw that term on xkcd, I was revisiting an article that I had edited but put in something that wasn't properly sourced, to see if it had been better sourced, or if I could add one (I forgot the article, probably a city wikipedia page). To my pleasant surprise, I saw a reference added! I followed it... to an external site that made the assertion. Great! I followed the link from there... only to end up on the original wikipedia article again. Whomever had added the reference hadn't paid attention to the fact that the external site was simply regurgitating what the unsourced wikipedia article had said. It was a process exactly as described in xkcd, and it was directly linked to an edit i made, it was literally my fault!

i forget if i deleted it or did some legwork to find a better source, but i definitely didn't let it stand. and now i'm very cognizant of this being a real risk.
15 years ago... no wait...17 years ago (shit) I wrote a paper on the then-current accusations by Patrick Byrne that Overstock.com stock was being manipulated via Wikipedia edits. While he's a bit out there in many ways it did turn out that there was funny business going on and tautological references that circled from Wikipedia to supposedly primary sources and back were part of it that I verified myself. I think there are more safeguards around that sort of thing at least for high-profile articles now.
 
Upvote
28 (29 / -1)

Corporate_Goon

Ars Tribunus Militum
2,334
Subscriptor
The irony here is kind of amusing, if only because not that long ago (okay, well, maybe 26 years is a long time) I remember that Wikipedia was banned from being used as reference source material for college papers because "anyone could say anything and no one checks it", which was possibly true at the time. Oversight of the data was, at best, spotty.

These days, it seems like their reliability and accuracy is much better. But it's also another generation later, so I'd expect they had positive changes. Otherwise they'd be about as common and well known as ICQ is today.
I don't see any irony here - it remains true that Wikipedia should not be cited as a primary source, because it's not a primary source. Wikipedia has always cared about linking to quality primary sources, which is the impetus behind delisting archive.today.
 
Upvote
118 (121 / -3)

kaibelf

Ars Tribunus Militum
2,034
Subscriptor
I have to wonder if any of this was worth it to archive.today. I’m also very glad that the Wikimedia Foundation doesn’t believe in sacred cows when it comes to shutting down lying scumbags despite it being potentially more expedient to turn a blind eye. Can they please run our government?
 
Upvote
116 (117 / -1)

Oregano

Ars Scholae Palatinae
737
Subscriptor
We're rapidly getting to the point that we'll need some type of blockchain-type ledger for determining archival "truth". I'm actually surprised it took this long to realize that a single point of "failure" is a bad thing.
Paper references have the laudable property that they cannot be remotely altered to say something different. Sometimes inconvenience is a feature, not a bug.
 
Upvote
88 (90 / -2)

Carbonado

Smack-Fu Master, in training
83
Reposting this from other thread: I figure it's a mite late to bring this up, but the best alternatives I've found so far are, while not flawless, good enough-ish.

Megalodon.jp: in Japanese. Caps you at 60 pages daily. Seems to be collecting only 30k page/month; kind of small scale.
Ghostarchive.org: Doesn't have canonical links, only shortened links.

I've been looking at Archivebox for personal repositories, then there's perma.cc if you've got the money (I don't).
 
Upvote
28 (28 / 0)
Post content hidden for low score. Show…
/good/.
we get that .today does a better jorb at taking snapshots of sites, and doesn't have the vast denylists of .org but. beyond all the inappropriate behaviour,

why the fuck would they rely this heavily on some website maintenance by some guy somewhere? guy can at any point say "fuck it" and pull the plug and all those links would be dead. fur sumfen like an archive, having some kind of institutional inertia is valuable
 
Upvote
8 (15 / -7)
Post content hidden for low score. Show…

adespoton

Ars Legatus Legionis
10,690
i thought the notion of citogenesis was amusing but hypothetical, but a couple years after i saw that term on xkcd, I was revisiting an article that I had edited but put in something that wasn't properly sourced, to see if it had been better sourced, or if I could add one (I forgot the article, probably a city wikipedia page). To my pleasant surprise, I saw a reference added! I followed it... to an external site that made the assertion. Great! I followed the link from there... only to end up on the original wikipedia article again. Whomever had added the reference hadn't paid attention to the fact that the external site was simply regurgitating what the unsourced wikipedia article had said. It was a process exactly as described in xkcd, and it was directly linked to an edit i made, it was literally my fault!

i forget if i deleted it or did some legwork to find a better source, but i definitely didn't let it stand. and now i'm very cognizant of this being a real risk.
Decades ago, I stopped editing Wikipedia and started writing my own articles elsewhere, specifically because of this. I'd rather people don't trust what I write and use it as a single unverified source, than that it get caught up in citogenesis.
 
Upvote
-2 (9 / -11)

xoa

Ars Legatus Legionis
12,363
Subscriptor++
why the fuck would that really this heavily on some website maintenance by some guy somewhere?
Because there is no other choice with the same benefits. Everything here has tradeoffs due to the state of the law.
guy can at any point say "fuck it" and pull the plug and all those links would be dead.
But so can all the original sites. And they do, all the time. We have decades of history of that at this point.
fur sumfen like an archive, having some kind of institutional inertia is valuable
But an institution is a centralized legal target, and the underlying behavior here isn't legal. But it should be legal. But it isn't. Hence the quandry! I wish people wouldn't try to oversimplify this sort of thing quite so much.
 
Upvote
56 (59 / -3)

adespoton

Ars Legatus Legionis
10,690
/good/.
we get that .today does a better jorb at taking snapshots of sites, and doesn't have the vast denylists of .org but. beyond all the inappropriate behaviour,

why the fuck would they rely this heavily on some website maintenance by some guy somewhere? guy can at any point say "fuck it" and pull the plug and all those links would be dead. fur sumfen like an archive, having some kind of institutional inertia is valuable
https://imgs.xkcd.com/comics/dependency_2x.png
 
Upvote
20 (22 / -2)

Lliwynd

Ars Centurion
246
Subscriptor++
I can see that Wikipedia running it's own archive site could be problematic (all sorts of potential copyright and other IP issues). But having a service that takes a checksum of pages, and then putting those checksums in their citations, might be useful. It would at least allow people to verify that the linked articles haven't changed.
 
Upvote
47 (50 / -3)

Excors

Ars Centurion
364
Subscriptor++
Paper references have the laudable property that they cannot be remotely altered to say something different. Sometimes inconvenience is a feature, not a bug.
Unfortunately paper references also have the property that it's trivial for an LLM to invent a plausible-sounding one, and very arduous for someone to travel to a library that contains a paper copy in order to verify the reference. Verifiability is crucial nowadays, you can't just trust that a reference is genuine.
 
Upvote
65 (70 / -5)

xoa

Ars Legatus Legionis
12,363
Subscriptor++
But having a service that takes a checksum of pages, and then putting those checksums in their citations, might be useful. It would at least allow people to verify that the linked articles haven't changed.
Note this is extremely non-trivial though. Checksums are extremely simplistic, they purely take in some given chunk of bits and spit out the result of their algorithm, changing with any single bit of change. They don't have any concept of "content" vs "formatting" or "GUI chrome". So if a site merely changes its logo or CSS or contact number or any one of an endless number of things that have nothing to do with the content itself, the checksum will also change.

In other words, the only way to have a meaningful checksum is to do so with a completely static snapshot as the reference, which then takes us right back to the original core problem. Checksumming would be useful if applied to something like archive.today itself, but it can't replace it. Wikipedia might be able to skate slightly in various ways, like having purely internal content archives that only employees of the organization directly could see not the general public, that could get them good fair use coverage. However, now the general public has to just trust the editors, independent verifiability goes out the window. Scaling becomes harder too. It'd be better than nothing but a real change too and still a lot of work.
 
Upvote
79 (79 / 0)
Post content hidden for low score. Show…

mannyvelo

Ars Scholae Palatinae
1,175
Subscriptor
Paper references have the laudable property that they cannot be remotely altered to say something different. Sometimes inconvenience is a feature, not a bug.

Paper references are actually quite difficult to hunt down these days. Do you have a copy of that journal? Does your library? And libraries do purge old things occasionally, or at least the public ones do.
 
Upvote
33 (33 / 0)

islane

Ars Scholae Palatinae
900
Subscriptor
I'm both glad at this and a bit disappointed.

Glad that Wikipedia used it's heft to shutdown a very bizarre instance of targeted trolling.

Disappointed in the operator of this site for their behavior and stupidity. Now more than ever we need a freewheeling, obsessive internet archiving system that disregards IP and paywalls. If we are limited to archives which "play by the rules", then we cannot (easily) retain the content of many news sites - this plays right into the hands of the authoritarians who wish to redact and retract free speech/free press.
 
Upvote
60 (61 / -1)

sk999

Smack-Fu Master, in training
75
I think it was a good decision. ArchiveToday is turning their visitors into attackers and they've modified the content of archived pages, destroying trust in their archive in the process. It's a shame because it's a useful service.

I also don't understand why someone tried to expose the operator(s), btw. Again, it's a useful service, they have enough enemies as it is... why do that? Yes, it's public info, but just because you can, it doesn't mean you should.

Anyway, from Wikipedia's point of view, I think this is the right decision.
 
Upvote
-4 (14 / -18)

SubWoofer2

Ars Tribunus Militum
2,549
I've been pleased with Wikipedia overall as of late.

Aside from this escalation, I witnessed also an admin step in recently in an unrelated topic where a page was being weaponised by anonymous editors.

They corrected the malicious edit, then locked the page out from anonymous modifications, which seemed to be effective.

My QC barometer was the Imelda Marcos entry, which at one stage ran for tens of thousands of words of hagiography and failed entirely to mention that - according to The Beatles - she "wanted her own Beatle" and, in essence, arranged a state-authorised kidnapping, after which The Beatles never toured again. The hagiography has been trimmed from English wikipedia. Dunno about the Spanish version.

Point being, there's enough editors that a critical mass can overrule the whims of page-territorial editors, if it happens that the community turns their collective gaze to such pages.
 
Upvote
25 (26 / -1)
We're rapidly getting to the point that we'll need some type of blockchain-type ledger for determining archival "truth". I'm actually surprised it took this long to realize that a single point of "failure" is a bad thing.

EDIT: could someone that downvoted me explain why? Not upset (t would be a silly thing to be upset about), I'm just curious about why blockchain-based proof of immutability over archival information is a bad thing.
I can't read people's minds but my guess is the word "blockchain" set them off.

As for Wikipedia sourcing efforts, while I do like the idea of Wikipedia having their own archive of things, I'm not sure how that would work with current copyright law. Physical books can be bought and stored or scanned, ebooks probably can at least be recovered somehow from whatever digital medium they're on, and stuff on the internet can be snapshotted, but it would need some sort of assurance that the material is verifiable by whoever needs to verify it, without also being up for grabs for anyone that would pirate copyrighted works using Wikipedia's copy.
 
Upvote
27 (30 / -3)
Post content hidden for low score. Show…