"MS Fnd in a Lbry" by Hal Draper in 1961.Reminds me of a science fiction short story whose title and author I have long forgotten. It was written from the viewpoint of an alien archaeologist analyzing the collapse of civilization on Earth. The story involved many iterations of information density, and, still. a storage building which extended clear through the planet and a long way out at each end.
Children's 12 year primary education was devoted to mastering the indexing system. It was the collapse of the indexing system which cost Earth all its accumulated knowledge.
The story is in the form of a report written by Yrlh Vvg, an anthropologist from an alien civilization who investigates the remains of human civilization approximately 175,000 yukals into the future. It turns out that humankind's fall was brought about by information overload and the inability to catalog and retrieve that information properly.
The title of the short story comes from the fact that all redundancy - and vowels - had been removed from our language in order for the information volume to shrink. Finally the sum of all human knowledge was compressed by means of subatomic processes and stored away in a drawer-sized box. However the access to that information required complicated indices, bibliographies etc., which soon outgrew the size of all knowledge.
The use of indices grew exponentially, comprising a pseudo-city, pseudo-planet and eventually a pseudo-galaxy devoted to information storage. At this point, a case of circular reference was encountered, and the civilization needed to refer to the first drawer-sized box to find the error. However, this drawer had been lost in the pseudo-galaxy, and soon the civilization fell apart while trying to locate the first drawer.
It turns out that the anthropologist's civilization is actually heading down the same path.
This information genuinely made me happier upon learning it, thank you Martin Blank. I've often wondered about what information humanity lost from the multiple burnings of that library, it's nice to hear the answer is actually 'not much'.Even the Library of Alexandria wasn't as big a deal as people make it out to be for exactly that reason. It was not the catastrophic event that people make it out to be.
This is a perennial topic on r/AskHistorians. Very little is believed by historians to have been permanently lost by the various disasters that occurred there because, while it was one of the larger and more prestigious libraries of the ancient world, it was not remotely the only one. It wasn't even the only one in Egypt, which had a number of libraries. Dr. Peter Gainsford (aka kiwihellenist) wrote at length about it here, including explaining the source of the myth of its destruction setting back civilization: Carl Sagan on a segment of Cosmos in 1980. Where he got the information or whether he just made it up is not at all clear, but it immediately took hold, and historians have been trying to dispel it for almost half a century. If you search for askhistorians library alexandria, you'll get links to a bunch more scholarly responses about what made it special and, more importantly, why it was not as special as people think.
I heard that lame joke already 10.000 years ago.Why are we developing a system that will store data for millennia? The way we’re going, we won’t be around for more than 20 more years.
All of this is great, but has storing the data ever actually been the problem? I'm decidedly ignorant here, so I'm asking a genuine question. My read on this was always that it was the data itself that became obsolete, as file formats aged out, software changed, etc. If you don't have software that can read the data, what good is the data?
This information genuinely made me happier upon learning it, thank you Martin Blank. I've often wondered about what information humanity lost from the multiple burnings of that library, it's nice to hear the answer is actually 'not much'.
There's no guarantee a tape will last 30 years.With tapes the problems are different. They are much more reliable than HDDs and they are guaranteed for 30 years, so normally there is no need to worry that they will become bad during storage. However the tape drive that you have may break and new drives for the old tape format may be no longer available for buying. Because of this, after at most 10 to 15 years you must copy the tapes to a newer tape format, to ensure that you are still able to read the tapes.
They also bitrot just like HDDs.There's no guarantee a tape will last 30 years.
I've had a tape snap in the drive.
There's no guarantee a tape will last 30 years.
I've had a tape snap in the drive.
Those packs are mostly processors not storage. And those episodes were Learning Curve and Macrocosm.And in Voyager, bio-neural gel packs... and in at least one episode, those gels got sick.
They had extra processors IN storage... does that count?Those packs are mostly processors not storage. And those episodes were Learning Curve and Macrocosm.
NoYes.They had extra processors IN storage... does that count?
(I'm always impressed by those who can remember episodes by name. I don't think I can recall the title of even one episode of any of my favorite shows.)
software is easy. even reverse-engineering formats isn't seriously hard.All of this is great, but has storing the data ever actually been the problem? I'm decidedly ignorant here, so I'm asking a genuine question. My read on this was always that it was the data itself that became obsolete, as file formats aged out, software changed, etc. If you don't have software that can read the data, what good is the data?
except SSDs are volatile media (unless your horizon is only a year or so).As a point of comparison, this is very roughly 1% of the volumetric density of a 2TB M.2 SSD (I didn't bother with the power-of-ten versus power-of-two factor). That being the entire drive, not the chips contained therein. I don't know what the entire glass "system" equivalent volume is, but even that 1% represents a severe practical penalty applied to the SSD.
So nothing to sneeze at, but there's definitely a ways to go.
Astronomers today have much better telescopes than astronomers did 100 years ago, along with much better cameras for recording the light those telescopes gather. Yet today's astronomers still refer to old images stored on glass photographic plates because comparing how things look today with how the looked in the past is an important part of astronomy. The data may be old and of far inferior quality than contemporary data but the fact that it is old is what gives it its value.sure. but is that the kind of data someone would think needs to be readable in 10,000 years?
presumably, if people in 12,000 CE have the tech to read these glass plates they aren't going to care much about every frame of data an ancient telescope generated. they'll have much better telescopes. the stuff people will want to store (effectively) forever will be more consequential.
The 10,000 years of proper motion of stars would make that data marginally useful. Only continuous observation for 10,000 years would tell you something. If you have continuous observation you dont need to go back 10,000 years.Astronomers today have much better telescopes than astronomers did 100 years ago, along with much better cameras for recording the light those telescopes gather. Yet today's astronomers still refer to old images stored on glass photographic plates because comparing how things look today with how the looked in the past is an important part of astronomy. The data may be old and of far inferior quality than contemporary data but the fact that it is old is what gives it its value.
And before cameras were invented astronomers used notes and hand-drawn diagrams ... and those too are still referred to. For example, ancient astronomers recorded the dates and locations of supernovae and those records are matched to contemporary supernova remnant data. Knowing the ages of some of the supernova remnants astronomers have found is priceless.
It would be impossible to find an astronomer who would not love to get his hands on 10,000 year old astronomical data.
A little fun reading: en.wikipedia.org/wiki/History_of_supernova_observation
You never know when cat behavior specialists will want to analyze as many of those videos as possible to gain insights into cat psychology.Maybe the larger question is how do we learn to get rid of data. We seem to accumulate massive amounts of information and yet I don’t think we are very good at getting rid of stuff we no longer need. I am not just talking about endless cat videos, but realistically, how many of the megabytes if not gigabytes of data do we all have on our laptops that we will never look at again. Maybe it is our own fault for being digital pack rats that are now fueling AI training systems?
In other words, maybe it isn’t all that bad to lose stuff every once in a while. I think of the hundreds of thousands of emails that I have on Gmail and how there’s no good way of cleaning them out as such I just archived them.
TBH, AI training has validated my data hording!You never know when cat behavior specialists will want to analyze as many of those videos as possible to gain insights into cat psychology.
That last bit is the biggest hurdle to this tech, especially since it relies on a deep learning AI model to interpret the etchings and distinguish focus from noise. That's VERY complex and not something those future generations can simply reverse engineer on their own. The instructions and the code would probably need to be macroscopically stored as large engravings on the stone walls of the building housing these crystals. In the case of AI, we're looking at a minotaur's maze worth of code.Existential question is "What information would we want to have handy for 10k years?" Bar tabs? Construction methods for concrete and wood buildings? DNA sequences? Court records and deeds? And how would we store the plans for the machine that can read this stuff? Fun questions.
I suppose it could also condemn it, depending on what you mean.TBH, AI training has validated my data hording!
To be clear, this is just deconvolution. That can be done analytically without the use of "AI", but it costs more. You're going to need error recovery anyway. It's less computationally expensive to have a simple neural network (this isn't a massive LLM) get you 95% of the way there, and let your ECC handle the induced errors from that process. Neural networks have been used in simulation and processing for this purpose for decades.That last bit is the biggest hurdle to this tech, especially since it relies on a deep learning AI model to interpret the etchings and distinguish focus from noise.
That's a LOT of steps to build all of those things, nothing "just" about it. We're talking about whoever finds these 10,000 years from now after all.To be clear, this is just deconvolution. That can be done analytically without the use of "AI", but it costs more. You're going to need error recovery anyway. It's less computationally expensive to have a simple neural network (this isn't a massive LLM) get you 95% of the way there, and let your ECC handle the induced errors from that process. Neural networks have been used in simulation and processing for this purpose for decades.
The 10,000 years of proper motion of stars would make that data marginally useful. Only continuous observation for 10,000 years would tell you something. If you have continuous observation you dont need to go back 10,000 years.
I'm not "hung up" on it, I'm pointing it out as a potential problem that needs solving. I WANT this, I think we ALL do. We're not saying "forget about it", we're saying "alright, here's where it needs improvement". What I see here is constructive criticism.A lot of people are really hung up on
1. Speed and density
2. The idea that the only effective solution will always be to keep copying due to degradation and “can’t trust it on a shelf”.
3. That the march of time means nothing will be readable/decipherable/relevant/blah anyway.
1. This will change. It obviously isn’t market ready. Don’t compare it to SSDs either, one of the least appropriate mediums for long term storage.
2. This is reflective of our technological limitations, not a philosophy of best practice. If stable storage was a thing, you wouldn’t need to do it. When was the last time you pulled a book off your shelf to make sure half the pages were still there ?
3. Sure. We should just give up now, right ? Just roll over, decide it’s all too hard and everything’s meaningless. Or should we just pat ourselves on the back that for pointing out obvious issues that rooms full of scientists couldn’t figure out for themselves ?
I feel like I’m taking crazy pills ! A stable medium that can do 5tb in a thin glass slab ? That’s fucking cool. It’s a proof of concept. Take the w.
The solar system doesn't actually stop moving relative to other stars because of satellites. Over 10,000 years Barnard Star would have moved 14 degrees from its current position.Not so much anymore unfortunately.
About 25 years ago I took an astronomy lab class at a an old observatory that had been swallowed up by urban sprawl. At they time they said proper motion measurement using old plates for a century long baseline. But they said the next generation satellite would have high enough resolution to surpass anything they could do and end that research program. I assume that means when Gaia started releasing their data it was over.
No, but the resolution possible from space is so much enormously higher that even with only a few years of data the error bars on their measurements are much smaller.The solar system doesn't actually stop moving relative to other stars because of satellites. Over 10,000 years Barnard Star would have moved 14 degrees from its current position.
I'm curious regarding the The Square Kilometer Array telescope data storing as a benchmark. Will there really be a necessity to store its raw data permanently? I'd liken this to photography or videography, where raw data is only stored for production (on the fastest drives available) and after postproduction (processing) you would only keep something like 1-5% of the initial data capacity. From what I gather the 700 petabyte figure is the raw data before processing but I might be mistaken.
A lot of people are really hung up on
1. Speed and density
2. The idea that the only effective solution will always be to keep copying due to degradation and “can’t trust it on a shelf”.
3. That the march of time means nothing will be readable/decipherable/relevant/blah anyway.
1. This will change. It obviously isn’t market ready. Don’t compare it to SSDs either, one of the least appropriate mediums for long term storage.
2. This is reflective of our technological limitations, not a philosophy of best practice. If stable storage was a thing, you wouldn’t need to do it. When was the last time you pulled a book off your shelf to make sure half the pages were still there ?
3. Sure. We should just give up now, right ? Just roll over, decide it’s all too hard and everything’s meaningless. Or should we just pat ourselves on the back that for pointing out obvious issues that rooms full of scientists couldn’t figure out for themselves ?
I feel like I’m taking crazy pills ! A stable medium that can do 5tb in a thin glass slab ? That’s fucking cool. It’s a proof of concept. Take the w.
You stated it needs "AI", as if that's some insurmountable task. It's neither insurmountable, nor required. It's a more advanced technique they're using because it's easier than the traditional DSP. It also doesn't matter a damn bit. They're calling it a 10kyr storage medium, and because of that, everyone in here is focused on it being used for the recovery of knowledge past the end of civilization. That's certainly an interesting topic, but not the purpose of this tech.That's a LOT of steps to build all of those things, nothing "just" about it. We're talking about whoever finds these 10,000 years from now after all.
That much we can agree with. On timescales where we aren't trying to communicate our lost knowledge to future civilizations, it's reasonable to conclude they'd be able to rebuild the reading mechanisms easily enough.You stated it needs "AI", as if that's some insurmountable task. It's neither insurmountable, nor required. It's a more advanced technique they're using because it's easier than the traditional DSP. It also doesn't matter a damn bit. They're calling it a 10kyr storage medium, and because of that, everyone in here is focused on it being used for the recovery of knowledge past the end of civilization. That's certainly an interesting topic, but not the purpose of this tech.
Per the example given in the article, this is intended for uses such as archaeological astronomy. An event happens and it's captured by the SKA. We don't know to look for it, so we don't see it. A hundred years from now, we've identified these events, and someone can go back through the old archival data from the SKA to find additional instances of these events. We already do this, going back through old photographic plates, old hand drawings of observations, and old written records from centuries ago. If a storage medium is shelf stable with no maintenance for thousands of years, then it will be the same after hundreds or even tens. Yes, you need to retain the hardware to read it, and the software to decode it, but that's a significantly easier problem than trying to work from the ground up in post-history.
That feels like someone in marketing wanted a big flashy number to release to the press.That much we can agree with. On timescales where we aren't trying to communicate our lost knowledge to future civilizations, it's reasonable to conclude they'd be able to rebuild the reading mechanisms easily enough.
But hey, when a company says "this will last 10,000 years", that feels like a mission statement, you know?
Yay, my original question is addressed!To be clear, this is just deconvolution. That can be done analytically without the use of "AI", but it costs more. You're going to need error recovery anyway. It's less computationally expensive to have a simple neural network (this isn't a massive LLM) get you 95% of the way there, and let your ECC handle the induced errors from that process. Neural networks have been used in simulation and processing for this purpose for decades.
It's an and, not an or. Consider the magnetic storage on a hard drive. You have parallel tracks of data, and as you try to read that data, you will see two-dimensional "blurring" of that data. You can't read a specific point without simultaneously picking up the field from surrounding points. You can take that data stream and reconstruct the value at a single point along it with a deconvolution. The neural network replaces this step, in a crude sense operating as a big look-up table encoded in the weights of the NN. ECC is applied afterwards.Yay, my original question is addressed!
Why is it less computationally expensive to use a neural network compared to ECC?
It seems to me that the ECC method would require less computational work at the cost of storing more bits on the glass to do the error-checking with, but I am not a mathematician.