Backblaze, owner of 317,230 HDDs, says HDDs are “lasting longer”

Calidore

Wise, Aged Ars Veteran
155
My 320mb (megabyte) IDE IBM hard drive from 1994 still boots Win3.1 and loads Doom II just fine.

Who says older drives are unreliable?
I was very happy using IBM drives, but then it turned out they were following the Wile E. Coyote rule: Soon after I read about their DeathStar drive failures, mine did too.
 
Upvote
11 (11 / 0)

Fatesrider

Ars Legatus Legionis
25,412
Subscriptor
The problem with Backblaze's figures is they only have one type of workload - backup. So as much it might tell you what backup does to disks I don't think you can extrapolate this to other uses.
I'm actually OK with that. The IMPORTANT shit is backed up and if those drives last longer, you have less concern about the important shit crapping out.

And if you back up disk images with some regularity, you're back up and running in a couple of hours (at most for most) when your work-a-day drive craps out. If those backup drives last longer, GREAT!

I'll be upgrading my NAS again with some larger arrays, so having more reliable data storage is on my radar for the near future. This is good news for that kind of thing.
 
Upvote
1 (1 / 0)

bdrram03

Ars Centurion
306
Subscriptor++
Fucking VERY justified. I will NEVER have another platter drive in a normal end-user computer. Do I use them in my NAS, and as external backups for my NAS? Sure, but that is a very different use case.

My main drives are solid state but I still run a couple platter drives in mirror, everything of course gets backed up to my NAS/Cloud. Maybe If i won the lottery or something i could make this all SSD.

1760726547610.png
 
Upvote
7 (7 / 0)
I had a set of 4TB WD Nas drives installed in my NAS, had one failure (which my warrantee happily handled). The rest of the drives in the nas had reached almost 11 years old with no issues before I replaced them. I've since replaced them (ran out of room) with a set of 8TB Ironwolf drives, which are approaching 1 year old now with no issues.
 
Upvote
1 (1 / 0)

palant

Smack-Fu Master, in training
3
I wonder how much the survivorship bias plays into these figures. Clearly, they’ve had far more significant failure rates around 2013. The ten years old drives failing now are the ones that neither failed back then nor in 2021 – the higher quality drives are still part of the evaluation while the models failing early on were largely counted in the previous instances of this analysis.

If correct, this would mean that the “bathtub curve” still holds, and it would still show if the question were asked differently. Like: “After how many years did the HDDs fail that were initially turned on in 2015?”
 
Upvote
6 (6 / 0)

valkyriebiker

Ars Tribunus Militum
1,590
Subscriptor
Interesting claim. I have yet to have an SSD fail. Back when I was using HDDs, I had one fail every year or two (I typically have 3-5 drives in my main personal PC). The 7200 RPM drives seemed to fail a lot more, regardless of brand, so before making the transition to SSDs (~13 years ago?), I was sticking with those.

Obviously, my sample size is pretty small, but the high failure rate I was experiencing with HDDs (On 24/7 but definitely closer to a typical consumer workload, otherwise) was just as much reason for me to transition as performance.
I can help with that sample size. As an I.T. guy, I've fixed and/or replaced a lot of computers and their parts over the years.

SSDs are, indeed, far more reliable in day-to-day use with only a few going bad.

But I've R&R'd probably a thousand dead hard drives. It's at the top of the list for most frequently replaced part, with PSUs coming in a distant second, then fans, then RAM chips. I've never had to replace a CPU -- funny, that.

2.5 inch HDDs are also significantly less reliable than 3.5 inch drives, in my observation.
 
Upvote
12 (13 / -1)

Frodo Douchebaggins

Ars Legatus Legionis
12,152
Subscriptor
My main drives are solid state but I still run a couple platter drives in mirror, everything of course gets backed up to my NAS/Cloud. Maybe If i won the lottery or something i could make this all SSD.

View attachment 120438

Oh ok yeah you've got quite a lot more storage in play than I do :ROFLMAO: I think I have 3TB of nvme storage on my gaming PC and 256gb or 512gb on my MBP
 
Upvote
1 (1 / 0)

EduSantos

Seniorius Lurkius
31
This matches my experience. It has been a long time since I have had a hard drive over 2 years old fail. The last one was a Seagate 3TB Barracuda, a model which was known for its excessively high failure rate forget which one), which failed at 6 years old.

I have still got a stack of 2TB and 4TB WD and Seagate drives that all function perfectly fine and have no bad sectors. They're just not really useful.
Same over here. Before I always had one or two go through warranty every year, and now I haven't done that in years!
 
Upvote
2 (2 / 0)
In many ways, you can think of a datacenter’s use of hard drives as the ultimate test for a hard drive—you’re keeping a hard drive on and spinning for the max amount of hours,

If you can believe the SMART values for my Seagate Exos drives, running a drive continuously seems to be better for it. There's a value called "Load_Cycle_Count" that increments by 1 each time the drive powers down from inactivity. That seems to be a wear item; my drives dropped to a SMART value of 64 (tens of thousands of cycles) before I realized it was a problem, and set up a script to set park times to never. Since then, Load_Cycle_Count has barely changed, and my reliability rating has stayed right at 64.

So, at least per Seagate, a home environment might be worse than an enterprise one.

edit: and since this was a default value, even in an enterprise, you'd want to set those park values to never. As far as I can tell, it has to be done after every power cycle; I do it with the openseachest utility. The relevant script:

Code:
#!/bin/sh

for i in `ls /dev/sg*`; do

/usr/bin/openSeaChest_PowerControl -d "$i" --idle_a disable
/usr/bin/openSeaChest_PowerControl -d "$i" --idle_b disable
/usr/bin/openSeaChest_PowerControl -d "$i" --idle_c disable

done

That doesn't work on the drive in my external USB enclosure, because the interface chip for that is odd and not widely recognized by IDE utilities. But that's okay in my case, because the mirrored drives in the enclosure (not both Seagate) spin up only once per day, to take a backup from the operational array. One Load_Cycle_Count per day is pretty acceptable, considering that, again, it took tens of thousands of cycles to materially impact the rating.
 
Last edited:
Upvote
12 (13 / -1)

AlicePlaysWithRockets

Wise, Aged Ars Veteran
133
Subscriptor
Used enterprise drives are the way to go. Cheap, high-performance, large capacity, and they're often available with 3-5 year no-questions-asked warranties. Yes, they're used, but any home server should be running a RAID or have some sort of redundancy. I have been running my unRaid server with nothing but re-certified HGST drives in the array for several years, and I'm never going back to consumer drives.
 
Upvote
15 (16 / -1)

Abulia

Ars Tribunus Angusticlavius
8,439
I had a set of 4TB WD Nas drives installed in my NAS, had one failure (which my warrantee happily handled). The rest of the drives in the nas had reached almost 11 years old with no issues before I replaced them. I've since replaced them (ran out of room) with a set of 8TB Ironwolf drives, which are approaching 1 year old now with no issues.
I have 4 2TB WD drives in my NAS still going strong. I went with the "WD Green" 5400 rpm drives. Can't complain. Now, QA comes and goes, so when it's time to replace them I've no idea what I'll go with, but low-power (aka "green") and 5400rpm seemed fine for a NAS environment.

My oldest drive is a WDC WD20EARX with 112000 hours on it, so he's clocked in over 12.5 years of non-stop usage. D:

[Edit] My bad. Looks like the WD spec sheet says they're 7200 rpm! :eng101:
 
Upvote
3 (3 / 0)

Chuckstar

Ars Legatus Legionis
37,457
Subscriptor
I would point out that processes such as TQM tend to address the front end of the bathtub curve most easily, if for no other reason than the feedback is faster. Add that HDDs are now a pretty mature technology, especially related to the parts with the highest tendency to fail. The market has also skewed heavily towards users that will pay up for longevity (and/or do the long-term analysis necessary to find the vendor(s) with better longevity).

EDIT: The point of that last sentence being that there’s less downmarket where they can sell the lower quality stuff, so incentivizing them to increase quality across the board, whereas before there was an incentive to end up with mixed quality and try to bin into separate price points using testing.
 
Last edited:
Upvote
6 (6 / 0)

ZTransform

Wise, Aged Ars Veteran
171
Obviously mechanical drives. Has anyone done any sort of life testing on SSD?
We have many high IOP clusters and the SSDs fail at a higher rate than the spinning rust. It's the nature of SSDs. They are fast, but there are only so many write/erase cycles in them before they are done, even with wear leveling.

However, with the SSDs that have failed, I've only seen a SMALL handful reach their write/erase limit. Most of them have failed due to data integrity errors long before hitting their cycle limit.
 
Upvote
23 (23 / 0)
Post content hidden for low score. Show…
We have many high IOP clusters and the SSDs fail at a higher rate than the spinning rust. It's the nature of SSDs. They are fast, but there are only so many write/erase cycles in them before they are done, even with wear leveling.

However, with the SSDs that have failed, I've only seen a SMALL handful reach their write/erase limit. Most of them have failed due to data integrity errors long before hitting their cycle limit.
Its also really awkward to get a reasonable figure of 'lifespan' in time for an SSD, as even though they tend towards being measured in Terabytes Written (TBW), the endurance is going to depend a lot on the types of NAND memory used (SLC, MLC, TLC) as well as the type of file system the disk is formatted for - ie something using a 'CopyOnWrite' type, whilst more stable in a practical sense, is going to clock up far larger write volumes more quickly.
 
Upvote
8 (8 / 0)
Used enterprise drives are the way to go. Cheap, high-performance, large capacity, and they're often available with 3-5 year no-questions-asked warranties. Yes, they're used, but any home server should be running a RAID or have some sort of redundancy. I have been running my unRaid server with nothing but re-certified HGST drives in the array for several years, and I'm never going back to consumer drives.
I've generally liked HGST drives a bunch, but when I was setting up my current RAID, I tried ordering a big batch of used 4TB models, which per Backblaze are very good indeed. And those drives were a disaster. Most of them were throwing errors in short order; ZFS really didn't like them. I ended up having to return the whole pile.

I eventually went for new 8TB Exos drives, which have been excellent.

Used drives, even HGSTs, are no panacea.
 
Upvote
14 (14 / 0)

not_secure

Wise, Aged Ars Veteran
172
Subscriptor++
Anecdotal evidence, but I've had two SSD failures total. And for those failures, I think that was a bad Sandisk lot, because I had two other drives from the same line that are still kicking since 2017 but were manufactured at a different time period. My 14-15 year old Crucial C300 is still puttering along just fine. Of course, I have some even older WD Velociraptor 300GB drives that are still working too. That said, cheaper HDDs have had a lifespan of about a decade for me and I simply wouldn't trust them to reliably backup my data past that point. In contrast, my MLC based Crucial and Samsung drives still have 95%+ health and those are all 10+ years old now.
At my workplace, we're seeing a spate of Samsung SSD failures at the 3 year mark. Mine failed three years to the day of its purchase, which also happens to be when its warranty runs out. Others are failing slightly before and slightly after that warranty date, but it appears to be statisically significant (4 out of 50 purchased at the same time).
 
Upvote
9 (9 / 0)

rapster

Ars Praetorian
446
Subscriptor++
Full disclosure: I’m the delivery guy for hard drives to Backblaze. I always drop the Seagate drives a couple times before taking them in.
Many years ago I worked for a small PC manufacturer. Our product frequently arrived at customer sites with peculiar damage. We were able to replicate this damage by dropping the boxes onto their corners from a 2 meter height, about the height of standing on a delivery truck bed.
 
Upvote
31 (31 / 0)
issue is most people think so... that the problem
The Backblaze stats are good, as long as you remember the way they choose their drives: the absolute cheapest per TB at the time of purchase. So, at any given time, they're generally sampling the worst drives available. If you're buying upmarket from where they are, your drives may do better.

They're the only company publishing stats on this stuff, and as long as you compare models that have a substantial sample size, it's better data than consumers can get anywhere else. Very few of us buy enough drives to have a meaingful sample size; Backblaze has a lot of different drive pools across multiple manufacturers across multiple years. There's just nowhere else to go that's better.

In fact, I strongly suspect their stats may be a big reason why drives are better now than they once were, since so many people use their blog posts for research. Back when, say, Seagate had a high failure rate, BB could see that and point it out, where normal consumers just couldn't.
 
Upvote
33 (34 / -1)

adespoton

Ars Legatus Legionis
10,757
I have a question, but I'm struggling in how to phrase it. In 2013, MTBF and AFR had specific curves. The stats show that when checked in 2021 and 2025, these curves had changed, suggesting drives are more reliable for longer now.

I remember back in 2013 when this was done, they provided the ages of all the drives they were measuring. I presume that by 2021, all of the drives they had, had been replaced and they weren't running any of the same drives. Is that still the case for 2025? It's only been 4 years. I personally haven't had any drive failures since around 2014, and so am still using all the same drives (even though I bought new drives in 2020 and again in 2024 under anticipation that It Was Time). In fact, of all my drives, it's a single drive from my 2020 purchase that has shown an increase in block errors (completely due to overheating) -- but the drive still has failed to, er, fail. Obviously, their data involves magnitudes more devices than my own experience, but I do wonder a) if older devices had failure issues that have since been solved in manufacturing and b) if survivor devices are hanging around longer and now skewing the statistics the other direction.

But as I say, that's not really quite what I mean to be asking. There's some other niggling thing hiding in all that data that I can't quite put my finger on.
 
Upvote
1 (1 / 0)
Is that still the case for 2025? It's only been 4 years.
If you go through to the the quarterly reports (dig around a little), they list the average age of each drive model along with its AFR. Smaller models are generally older, and will thus have much higher average months of service. As they rotate the smallest drives out, that also rotates the oldest drives out, so they gradually drop off the standings altogether.

AFRs on newer drives will be inherently better because they're newer, and there's no way to directly compare, in any given blog post, how drives at, say, 60 months in 2025 compare with drives at 60 months in 2015. You'd have to go and try to extrapolate the data yourself by looking at average ages and AFRs in earlier years, and compare those figures with current models. It will be a very rough approximation.

You may be able to download the raw data from them. I haven't looked at it, but it might let you compare drive aging in a much more granular way. Running from summaries is never as good as running from the original data.
 
Upvote
-1 (0 / -1)
In my sample size of one, I've had more hard drives from my own computers become obsolete and get taken out of circulation just due to limited capacity than from failure. One Seagate desktop drive in a NAS failed -- and it ran 24/7 but the enclosure may have had poor ventilation. Another was a Maxtor OEM drive in a Mac G4 tower. The rest have been a handful of Toshiba portable USB drives, which never really failed outright but acted up intermittently. But by and large, I've only rarely encountered hard drives that failed in my own equipment. Admittedly, I tend to run my desktops 24/7 much like servers, but Windows and hard drive firmware in models from the past 10 years or so tend to want to spin drives down anyway, so there's still be mechanical/thermal stress.

Now, in my working life, I've seen plenty. Maxtor drives that overheated due to bad design that let the drive basically cook itself is #1, followed closely by laptop drives mistreated in various ways. Of the servers I've tended to, the vast majority have been file and print servers, which tend to soldier on until the hardware is EOL, and everything is migrated to new hardware or a cloud platform. Failures tended to occur in high-use database servers and once in a great while in a multi-disk array used by a hypervisor.

In general, I've had great luck with spinning rust, and still have high-capacity externals around for general storage. But SSD's are at the point now that if I/O performance matters at all, they're economical enough for reasonable sizes that they're a no-brainer. And an all-SSD tower is so much quieter.
 
Upvote
7 (7 / 0)

henryhbk

Ars Tribunus Militum
2,010
Subscriptor++
A LOT of people used to cite Backblaze stats as a reason to not buy any kind of storage made by Seagate, and it was always annoying to me. Thankfully I don't see it as much now.


I've had 3-5 HDD's in all my computers ever (starting from the late 90's, most of which were scavanged/bought used, also I would replace my computers every year or so, so that is a lot of drives) and I think I've had ... four HDD failures, if I count the ones that worked when I took them out but a decade later was dead. Live HDD failues, as in a HDD that failed when it was in use? One.
I think most people's data needs expand before most HDDs hit their MTBF, I know my RAID that everyone in the house backs up to certainly do. I have had one drive fail in 15 years in the array (annoying but no loss obviously just popped another in and up it came). One failure that has mostly gone away is since most RAIDs are no longer hardware RAID they are software there is no longer that RAID controller to fail (yes the computer can die, but another computer (assuming not HW key encrypted) can just attach and keep going. I do have it make a backup of the RAID map and partition maps in a separate sector so a file system corruption hopefully won't bork the RAID setup. I am going to have to upgrade from the 12TB drives (Hitachi if I recall) that are running it to 20s soon, as backups keep getting bigger and more machines hit the RAID (enough bandwidth that I had to move that to my 2.5gbs ethernet subnet from the gigabit)
 
Upvote
3 (3 / 0)

Juvba Fnakix

Ars Scholae Palatinae
613
Subscriptor
I had a string of early failures I eventually attributed to reconditioned chia coin drives. Seagate do not sell direct in the UK so I worked through their list of authorised distributors. All but the last were sent with completely inadequate packaging (for the last I threatened to send a picture to Seagate if they did not do a proper job and they improvised something tolerable). I was lucky that one of drives was too busted to spin up but the others showed as new with SMART. They had huge corrected error rates and started losing data within a couple of months - or days. I eventually fixed the problem by buying a 20TB drive because they are too modern to have been used for chia.

I am expecting a year of reconditioned ex-Windows 10 SSDs to be sold as new starting soon. If I need a new SSD in the next 12 months I will be checking the manufacturer's website for a way to authenticate devices as new before purchase.
 
Upvote
8 (8 / 0)
There's a strange oeriodicity to the graph that makes me wonder if we're seeing a methodological problem that's creating aliasing. Why would drives fail most often at "odd.5" ages, and least often at "even.5" ages? I'd expect the individual ticks to show more random noise next to their neighbors, and less of a modulated 2-year sinusoid.
I think people are forgetting a major problem in analysis/statistics that you have many variables that are very obvious, but people forget. Such a, the fact higher density data drives,, take longer to fill certain sections that could be bad, that the drive and FS systems work to hide. And they don't re-use the same sections anymore. So the larger the drives, the longer it takes to determine if they are failing, which means there is not a true linear curve to HDD usage, and HDD destruction from usage takes longer to determine now, because of how the FS and internal hardware manages the data., you have to treat all the data periodicity as a black box, which means the data curves will be longer and higher on a crunched axis because of this performative drive longevity magic they try to perform. The data charts can't be compared this way, make it 3D.
 
Upvote
6 (6 / 0)

afidel

Ars Legatus Legionis
18,216
Subscriptor
I think people are forgetting a major problem in analysis/statistics that you have many variables that are very obvious, but people forget. Such a, the fact higher density data drives,, take longer to fill certain sections that could be bad, that the drive and FS systems work to hide. And they don't re-use the same sections anymore. So the larger the drives, the longer it takes to determine if they are failing, which means there is not a true linear curve to HDD usage, and HDD destruction from usage takes longer to determine now, because of how the FS and internal hardware manages the data., you have to treat all the data periodicity as a black box, which means the data curves will be longer and higher on a crunched axis because of this performative drive longevity magic they try to perform. The data charts can't be compared this way, make it 3D.
Backblaze fills their drives in days or weeks.
 
Upvote
6 (6 / 0)

sarusa

Ars Praefectus
3,281
Subscriptor++
am happy people are calling out the poor back blaze study
It's not a 'study' - if you were doing a 'study' and could somehow get someone to pay for it you would do things very differently. This is 'hey, we buy thousands of times as many drives as you will ever see in your entire life to get our job (backup) done, we track when they go bad, here's what we saw.'
 
Upvote
21 (23 / -2)
Interesting claim. I have yet to have an SSD fail. Back when I was using HDDs, I had one fail every year or two (I typically have 3-5 drives in my main personal PC). The 7200 RPM drives seemed to fail a lot more, regardless of brand, so before making the transition to SSDs (~13 years ago?), I was sticking with those.

Obviously, my sample size is pretty small, but the high failure rate I was experiencing with HDDs (On 24/7 but definitely closer to a typical consumer workload, otherwise) was just as much reason for me to transition as performance.
I've had an SSD fail, but it was definitely at least partially user error. I forgot to pull the tape off the thermal pad for my mobo's M.2 heat spreader, so it was likely running much hotter than intended.
 
Upvote
2 (2 / 0)
the worst drives for me have been wd by a mile.hell i have a old samsung still work that a ide and a first gen sata.
any massive hdd deployed luagh at their data set they make.
wrong work loads in balance load config etc.
hell their og ssd test data set was with zero drives that had firmware updates on. not a single drive they put in the data set had their firmware updated.
but i get its some bible and moment people use that term. i dont take them serous.
 
Last edited:
Upvote
-13 (0 / -13)

mfirst

Ars Centurion
336
Subscriptor
Something not mentioned in the original blog post or remarked on here is that Backblaze has improved their enclosure design over that time. They're published articles about their bespoke storage server cases and how they've evolved. That includes improved vibration isolation and temperature management. That could be a large confounding factor in them seeing decreasing failure rates in their fleet.
I was wondering something similar about their algorithms for writing and reading data. Do their results assume the drives are spinning at a constant rate per minute 24 seven? Maybe a different metric is failures per billion360° rotations
 
Upvote
0 (0 / 0)