User Tools

Site Tools


wiki:linux:digitaldata

Some Thoughts on a Good Digital File Storage Strategy

1.0 Introduction

In ye olde days, we had collections of LPs taking up a large amount of space; then we re-purchased all of them as CDs… and had large collections of CDs taking up quite a lot of space. These days, I hope, we've ripped all our CDs to digital music files taking up no space whatsoever, except in the sense that they reside on a hard disk platter 'somewhere' (in my case, in some computers running in my loft!)

When you switch to digital file storage, however, you need to make sure your storage strategy is really watertight, otherwise you'll wake up to discover that all those CD rips you thought you owned are, for one reason or another, lost forever.

Chances are, too, that all the above is true for your photographs. Where once we had tonnes of thick photo albums, now we have a virtual pile of JPGs or NEFs sitting on a hard drive. Lose those, and your wedding, christenings, birthday parties and holiday snaps are consigned to your wet-ware (and fallible!) memory for all time thereafter…

How, then, do we store digital assets securely, for the long-term?

The answer, I think, is quite complex but can be boiled down to the following general points:

  • Store on ZFS
  • Use ECC memory
  • Use multiple servers
  • Use multiple disks in arrays
  • Use multiple operating systems
  • Copy between servers at different rates and at different times
  • Use 'The Cloud'
  • Automate your backups
  • Automate periodic checks of the integrity of your audio files

Let me elaborate a little on each of those bullet points in turn.

2.0 Store on ZFS

Bit-rot is 'a thing', and something you don't want to experience. A digital file can have some of the individual bits which make it up 'flipped'. What was originally a 1 gets zapped by a cosmic ray to be a 0, or a 0 is flipped to a 1 because your hard disk is a bit knackered and doesn't get written to or read from as perfectly as it once was. If your digital file is of music, a flipped bit can sound like a pop or click; if it's a photograph, it can appear as an odd spot on the picture, or a patch of weird colour.

So bit-rot is not good and you don't want to experience it.

To prevent it happening, the file system in which your digital files reside needs to be able to detect when it's happened and to reverse it by copying good data back over the bad.

There is no such file system for the Windows operating system. ReFS was developed by Microsoft with some bit-rot-aware capabilities, but has since back-pedalled on its general release and its future is therefore highly uncertain. Neither NTFS nor FAT32 has bit-rot-aware capabilities of any sort, so they are unsuitable as file systems for storing anything you really care about. Besides, Windows server operating systems are expensive.

Linux has two bit-rot-aware file systems: ZFS and Btrfs. Btrfs's future is a little uncertain given Red Hat's undertaking to deprecate it in future versions of their Enterprise-class distro. It also has several known issues with its ability to do 'data redundancy' configurations of hard drives, such as RAID1, RAID10 and RAID5 or 6 -configurations which are vital to ensure data resilience in the long term. I couldn't, therefore, really recommend using Btrfs.

Which leaves ZFS as the only bit-rot-aware file system that Linux can use without drama in all RAID1, 10, 5 and 6 configurations. Technically, licensing issues mean that 'ZFS on Linux' (ZoL) cannot ship as a native feature of any Linux distro, but it can be deployed on nearly all of them with little fuss. I would, however, raise two notes of caution.

First, ZoL is supplied as a kernel module, so if the Linux kernel changes because of an update/upgrade, ZFS will simply stop working until the ZoL developers catch up and release a newer version of their software to match the newer kernel. This means fast-moving, bleeding-edge distros such as Fedora or Arch are questionable platforms on which to depend on ZFS, since their ZFS implementations will frequently stop working for weeks or months on end until the ZoL developers do their work. However, slower, more stable distros (such as CentOS or Ubuntu LTS releases) are probably acceptable in this regard.

Second, however, is a known issue: Linux kernel version 5.0 and up have now removed some long-deprecated kernel functions on which ZoL relies. Practically, this means at the time of writing that ZoL will not work at all on any Linux distro that uses, or is upgraded to use, a version 5.x kernel version. No doubt the ZoL developers will eventually come up with alternative ways to achieve their required functionality -but it's an open question whether their re-worked software, when it comes, will work as well as the versions that used the now-gone kernel functions.

Short version, therefore: you can't use Windows for your storage servers. You can use Linux for your storage servers, but only so long as the distro concerned isn't bleeding edge and doesn't update to use a version 5.x kernel any time soon. Practically, this means that CentOS 7.6 and Ubuntu 18.04 are suitable O/S platforms to use, until around October 2022 (when the upstream Red Hat Enterprise Server 7.6 stops getting security updates) or April 2023 (which is when Ubuntu 18.04 stops being supported by Canonical).

There are non-Windows and non-Linux alternatives you could consider, too: OpenIndiana is an open source clone of Solaris 10, so it's free to use and deploy, gets support and updates …and ZFS works natively on it without the sort of kernel worries that might affect Linux in the future. However, Solaris in any flavour or guise is a fairly heavy-weight operating system and might not run well on the sort of underpowered hardware you're likely to be storing your digital assets on at home!

Fortunately, the various flavours of BSD are much more lightweight, are equally free and open source, fully supported …and run ZFS natively (in the sense that ZFS is part of the core O/S and doesn't need to be bolted on afterwards. Being part of core O/S functionality, it doesn't suddenly stop working when the core O/S gets updates or upgrades). FreeBSD is probably the best candidate out there for this role, in that it has a good reputation for stability and security, whilst not being completely bizarre to install!

So, summing all that up:

  • you need to use ZFS as your file system, because bit-rot is a problem
  • that means you can't use Windows for your file storage
  • you can use slow-release forms of Linux, such as CentOS or Ubuntu LTS
  • you can use OpenIndiana if your hardware is reasonably good
  • you can use FreeBSD (or any other BSD flavour, if you prefer)

3.0 Use ECC Memory

Error Correcting Checksum computer memory is more expensive than “regular” RAM. But ECC RAM can detect and repair most of the common kinds of in-memory corruption. Such in-memory corruption can happen for all manner of reasons, ranging from hardware misfunction to cosmic ray strikes (really).

In short, ECC RAM is required to allow a computer detect and correct data corruptions taking place in memory, long before that corrupt data gets written to disk. Without ECC RAM, corrupted data will be written to disk where even ZFS won't be able to tell that it's 'wrong'. With ECC RAM, however, your server will ensure non-corrupted data in memory, thus non-corrupted data will be written to disk. Once on disk, ZFS will detect if that known-good data gets corrupted on the disk at any time in the future and correct it if it does.

ECC RAM is, therefore, the twin sister of a decision to use ZFS as your file system: you can use ZFS without ECC RAM, but it's a little bit pointless to do so, since you'd then be asking your file system to ensure non-corrupt data when you can't really trust your server hardware to keep data non-corrupt in the first place.

Short version, therefore: If you care about your digital data being preserved in non-corrupt form in the long term, make sure you use ECC RAM.

Some consequences flow logically from the decision to use ECC RAM, however. It means you cannot use 'consumer grade' hardware, since most motherboard manufacturers for mainstream desktop PCs do not design their products to use ECC RAM. You may not use most consumer-grade Intel CPUs, either, since they don't support ECC RAM either. (For example, read the reply to the last question on this Intel page:

On the other hand, it's quite easy to find hardware which is modest in cost which does support ECC RAM. For example, the HP ProLiant Microservers have always done so, and can be obtained relatively cheaply.

4.0 Use Multiple Servers

So, you make the decision to use ZFS and ECC RAM, and that pretty much mandates you go out and buy a 'proper' server (such as the HP Microservers previously mentioned). Is all your digital data now safe?

Well, it's certainly safe from memory and file system corruption. But the short answer to that question has to be, 'No'.

Server hardware fails. Houses get burgled and burn down. Rather less dramatically, users can be prone to accidentally deleting a file when they meant to rename it; or to saving a word processor file over the top of a rare music file. Your digital storage strategy needs to protect against disasters and users, basically -and if all your digital 'eggs' are in only one server 'basket', you can't really do that very well.

It is certainly possible, for example, to build a server with so many hard disks that you could logically 'partition' the server into two separate storage areas -and then copy data from storage area 1 to area 2 on a regular basis. One server could thus host two separate and independent data sets, which is redundancy of sorts. However, those two data sets are still stored in the one server, so if that experiences significant hardware failure -or simply gets nicked by a local ne'er-do-well- your data is (at least temporarily) toast.

So instead, I'd recommend a modest investment in at least two and ideally three separate servers. Make one of them the 'source of truth' for everything, then have it copy its data set to the other two servers at different times. The other two servers thus provide hardware-independent, redundant storage of your 'true' data.

Naturally, thieves or house fires can make just as short work of three co-located servers as they could of one …but at least you're unlikely to have all three servers die of hardware failure at the same time.

4.0 Use Multiple Disks in Arrays

Rather wonderfully, we live in a world where it's perfectly possible -if your pockets are deep enough!- to buy a single, enormous hard drive that's probably big enough to cope with most people's music, photograph and movie collections. But it's another case of eggs-in-one-basket: if that hard drive fails, you lose everything.

Besides that, ZFS can only really do its 'corruption-prevention' act when it has multiple, independent disks to work with. Put simplistically, that's because it will store a music file (say) on one disk, but store the parity data about that file on a second. If the parity disk dies, it can re-calculate it by re-reading the data file; if the data disk dies, ZFS can re-construct it by using the surviving partity data. I am over-simplifying a lot, but the general principle is undoubtedly true that ZFS works best when it's applied on top of an array of multiple, independent hard disks. The multiplicity of hard disks makes it possible for ZFS to properly protect data from corruption and to fix it when corruption is detected. ZFS on a single drive is certainly possible, but at that point it can only detect corruption, not fix it, which is somewhat less than helpful.

In terms of what sort of disk arrays to use, I think your budget will come into play at this point, since hard disks of reasonable capacity are not negligibly cheap.

As a case in point, whilst my main servers run with 4 x 4GB NAS-class hard disks, I couldn't afford to buy three sets of them. But I did have some 3TB NAS-class drives hanging around from previous builds, and I also had a stack of desktop-class 2TB drives going spare. Accordingly, with three servers to build, I made my front-line 'source of truth' system run 3 x 2TB drives in ZFS's equivalent of a RAID-0 configuration. This is a stonkingly stupid thing to do: RAID-0 means that if any single drive fails, the entire array is lost. But it also means I got 6TB of usable space out of the 2TB drives, which is basically the amount of digital data I have that I care about -and my other two servers are there to provide backup for the data that would die if this first-line array dies.

Using my four 4TB drives, I could construct ZFS's equivalent of RAID-10: that is, a pair of drives is set up as a mirror; the other pair is set up as another mirror; and then data is striped across the two mirrors. It's an extremely safe way of running things: I should be able to lose two disks on different sides of the stripe and still have usable data. It also means that 16TB of 'raw' disk space becomes just 8TB of usable space, since two of the four disks is merely mirroring the contents of the other.

And with my four 3TB drives, I could construct ZFS's equivalent of RAID-6, in which data is striped across all the drives, but two of them are reserved for storing parity data to use to recover from detected corruptions. This means I can lose two disks and still access my data, but it also means I only have 2 x 3TB of usable disk space… but that's still 6TB.

So, one array is unsafe, but has 6TB of usable space; one is very safe and has 8TB of usable space; the third is also very safe and has 6TB of usable space. One way or another, therefore, I have redundancy, recovery from detected data corruption, resilience from disk failure and a minimum of 6TB of usable space.

Of course, if I were Lord Moneybags, I'd probably go out and buy 11 or 12 brand new 10TB disks and bask in my practically-unlimited storage capabilities. But there are limits to my spending abilities, so I make do instead! So long as my music, photographs and movies are safe, I'm fine.

Regardless of my specifics, though, the short version of this point is: you need lots of disks, bought at different times (so you don't get all the disks from the same bad Friday afternoon batch), in a variety of configurations to provide 'defence in depth'. No one array should be considered a 'definitive' source of safety, but in combination, your multiple disk arrays probably can be.

5.0 Use Multiple Operating Systems

Just as I don't rely on one disk manufacturer, one disk production batch or one server, so I don't rely on the capabilities of just one operating system. I've already mentioned that Linux is due to have serious problems with ZFS when it widely adopts the version 5.x kernel: hitching everything you've got to the Linux post, therefore, would seem to be a less-then-sensible approach. On the other hand, whilst OpenIndiana uses ZFS natively and has none of the licensing qualms about it that make ZFS on Linux at times problematic, it's an extremely niche operating system whose long-term future has to be in some degree of doubt. Meanwhile, FreeBSD also has no issues with ZFS, and has a strong developer and user community which seems to suggest its future is in safe hands.

So why not just use FreeBSD for everything? Simply because I don't want to trust all my digital assets to the decisions of one group of developers!

I'd also add that if your multiple servers are of different hardware capabilities, you might find adopting a single O/S problematic. OpenIndiana, for example, is a heavyweight and runs appallingly slowly on some of my kit, but acceptably on other bits of hardware. It wouldn't make sense, therefore, simply to put OpenIndiana on every server I own.

Personal experience comes into this, too, and can affect your choices. For example, I have a need to share my music files around the house to various devices, some running Windows and others Android or Linux. The simplest way of achieving that is to use SAMBA file sharing -and I am far more familiar with setting up SAMBA on Linux than I am on OpenIndiana or FreeBSD. I could, of course, learn to do it on both those operating systems -but if you need a home network *now*, you go with what you already know, rather than what you might be able to achieve in the future! Accordingly, one of my servers runs CentOS, because it can do ZFS quite well for now, but can also do SAMBA very easily. Another runs FreeBSD, because it runs ZFS brilliantly and doesn't require massive hardware resources to do so… but I would hesitate to set up SAMBA on it today. And a third runs OpenIndiana, because I like Solaris, the box it's on is reasonably capable, it runs ZFS natively and I don't need to share its data out to anyone else.

Short version here, then, is: be flexible in the operating systems you use. Be prepared to use more than one, because that way you spread risk. Tailor your operating system choice to your hardware capabilties, your skill levels and to the nature of the functionality you require from your home digital network.

6.0 Use a complex copying strategy

So, we've established we need multiple servers and that we'll copy data between them to provide multiple, redundant copies of the digital data we care about. My suggestion at this point is that this data copying between servers needs to be rich and complex. It's not a question of just copying everything from box A to box B every night at 9pm, for example. What if you were to delete a large chunk of your music at 8pm and didn't notice until, say, 10pm? Well, you'd be screwed, because the data would also have been deleted off box B by the time you realised there was a need for data recovery!

So, yes, you need to copy from A → B, and from B → C, but I'd stagger it so that the copy from A → B happens, say, every other day, and that from B → C happens once a fortnight. Then, if you screw up box A and notice within a day or so, you can recover by copying back from box B. Yes, you'd lose some data, but only a day's-worth. If you screwed up box A and didn't notice for maybe a couple of days, you'd still be able to recover by copying back from box C, given that the screw-up has probably replicated to box B by then.

Get the idea? By having copies taking place between servers at different schedules, you give yourself a 'depth' of data that allows for long-reach recoveries.

You can then make your redundancy strategy even richer by distinguishing between a 'copy' and a 'synchronisation'. Put briefly, a 'copy' is “copy everything from box 1 to box 2”, whereas a synchronisation is “copy everything from box 1 to box 2 and if a file is present on box 2 that isn't on box 1, delete it from box 2”.

For example, say I take a photograph and save it as “rome_1.jpg” on server A. That night, a copy between A and B takes place. So now we have this:

Server AServer B
rome_1.jpgrome_1.jpg

The next day, I rename the file on server A to be “rome-at-night.jpg”. So now we have:

Server AServer B
rome-at-night.jpgrome_1.jpg

The next night, the copy from server A to B takes place:

Server AServer B
rome-at-night.jpgrome_1.jpg
rome-at-night.jpg

That is, we did a mere “copy” which means “take what's on server A and put it on server B”. But it does not mean “delete anything from server B”. So Server B ends up with two files, whereas A only has one. Imagine that instead of merely re-naming my files, though, that I'd done a lot of Photoshop work on my files: then Server B would have, maybe, the original and (now) the worked-on photo. Server A only has the worked-on version. Notice, therefore, that this sort of copying basically builds up a file history on Server B. You now take a look at your photo on Server A and think, 'God, I really need to learn how to use Photoshop better! I wish I could get the original photo back and try again!!'…. and sure enough, you can do that sort of recovery, because Server B still retains the original photo.

Now, eventually, you synchronize server A and B, so maybe after a week, you end up with this situation:

Server AServer B
rome-at-night.jpgrome-at-night.jpg

A synchronization means “delete from B if it isn't present on A”, so we finally lose the original version of our photo. But if you copy nightly, and only synchronize (say) weekly, then you'll have had the ability to revert to the original photograph for up to a week before, eventually, the synchronisation takes that option away from you.

But remember you should probably have a Server C, too, right? So server B could copy to server C weekly and synchronize to server C only once a month. Arrange things like that, and you could recover your original photograph for up to a month, since the original wouldn't be lost from server C until that B → C synchronization takes place.

The short version here, therefore, is: if you stagger your copies between servers so that they take place at different rates; and if you mix 'copies' with 'synchronizations', you can arrange things so that you can retrieve data from a variety of places, potentially going back weeks or months.

Naturally, the servers holding on to lots of copies of files before they get cleaned out with a synchronization need more disk space than those who are synchronized more frequently. But that just means you tailor your server roles to the amount of disk space you have available. The one with most space becomes server C; the one with least must be server A. If you're lucky enough to have all servers configured identically with equivalently-large disk capacities, fine: you don't need to have your choices restricted in that way, but so long as you abide by the principle of 'defence in depth', by multi-copying and multi-synching between multiple servers, you'll be able to recover accidentally lost or corrupted data with relative ease.

7.0 Use 'The Cloud'

Nothing I've described so far, no matter how subtle or complex it is, will save you from your house burning down. Likewise, of all three of your servers are sitting in your basement and it floods, all your data is gone!

In short, multiple servers copying data between themselves is a big step in protecting your data, but when those servers are co-located, they remain vulnerable to co-destruction -and bang goes all your data at that point.

Therefore, it's important to get your data away from your house, somehow. This can be accomplished in a number of ways. For example, I have a couple of 4TB external USB drives onto which I periodically copy key parts of my digital data (4TB isn't big enough to store it all, so I have to get selective). When I next visit my sister in Reading, the drive goes with me and gets swapped with the one I left with her a few months previously. The data is only stored on the ext4 file system, so it's not resilient; it's also a single drive, so if the hardware gives up the ghost, that particular backup is toast. But it's at least a form of off-site backup that protects, to a degree, most of my really key data.

Another way to achieve the same sort of thing is to adopt some form of 'cloud storage'. For a relatively modest £80 per year, for example, I can buy a Microsoft Office 365 Home subscription. That gives you Office software for up to 6 people (which I have no need for, being a Linux user!), but each one of those 6 accounts is then also given 1TB of cloud storage. Now a single terabyte is just about big enough for my music collection; it's also just about big enough for my photographs. The movie collection won't fit in a single 1TB, but I could spread it across the remaining 4TB of available cloud storage if I cared enough (I don't!).

Thus, whatever my servers might be copying and synchronising between themselves, I get one of them to link to my Office 365 accounts and thereby get a copy of its data wherever Microsoft deigns to store it.

There are caveats galore here, of course. You're sharing your data with some MegaCorp, for starters. You could use some sort of encryption to protect your privacy, of course, though as far as my music is concerned, if Microsoft wants to listen to a performance of Götterdämmerung, they're more than welcome to do so! But Microsoft, like any cloud provider, could change its contract terms at any time; change the pricing to extortionate levels; decide to get out of the cloud business altogether… you get the idea. Using someone else's services makes you dependent on that someone else, and that's not a great place to be when it comes to taking reliable, off-site backups. You might also live somewhere where data caps or Internet speeds make storing even 1TB in the cloud a practical impossibility.

Fair enough: cloud storage might not be ideal for you and in that case, don't use it. But if you can, you'll find it an invaluable way of putting at least some of your slower-moving and non-personal data out of harm's way. In any case, I don't use my Office accounts as my only off-site backup technique, as my sister will readily attest; but it's an important component in my data preservation strategy nonetheless.

8.0 Automate File Operations

Copying files between servers and doing periodic synchronizations is too important to be left to you remembering to do them! They need to happen to a rigorous schedule in a completely automated manner: if you don't know how to knock a bash script together or use crontab to achieve these things, then you'll need to learn how (and this site might be able to help you!).

9.0 Automate File Checks

Audio files, in particular, need to be automatically checked for internal integrity on a regular basis. (I am unaware of any similar checks that could be applied to photographs or videos, short of visual inspection). FLAC provides an integrated way of doing this. Thus, with a bit of bash shell scripting and some light work in crontab, you can automatically scan every one of your music files for internal integrity maybe once a month. If you find that, despite your best endeavours with ZFS and ECC RAM, one or more files have inexplicably developed playback errors, you still have servers B and C from which you should be able to restore good versions of those files.

10.0 What? No NAS?!

Everything I've written so far makes protecting your digital assets sound complex, possibly expensive and an exercise in Unix administration. Pretty accurate so far, I think! In the old days, I imagine being an Audiophile meant knowing your way round the technical underpinnings of a reel-to-reel tape recorder, or knowing how to tell whether Amplifier A was add distortion to your signal because one of its valves was blown. The need for expertise hasn't gone away, in other words; it's just transformed. Fortunately, I happen to think that knowing how to throw three servers together in data-resilient configurations is something pretty much anyone can learn, if they're under 90!

But someone is always going to mention products such as this one or this one: a simple box of tricks you take out of its packaging, stick in some hard drives and you're off. Why aren't I recommending these?

Well, first off: I can't recommend them, because I have never used one; I have always gone home-brew NAS rather than shop-bought.

But, with the understanding that I'm not talking from first hand experience, I'm going to not recommend these sorts of product because:

  • They are quite expensive
  • They are frequently poorly-performing
  • They usually can't use ECC RAM
  • They never run ZFS

The last two points in particular mean these sorts of product are completely out of the race for being a really resilient way of protecting your treasured digital assets. They would certainly be better than nothing, but wouldn't be something I'd trust my data to long-term.

Besides that, there's an additional factor: they take you away from the business of caring for your data. They abstract everything into a beautiful 'product', thus encouraging a sense of security which is, fundamentally, quite false. Your data should be safe because you know you made it so, not because you rely on a product you think will make it so.

It is, perhaps, a counsel of perfection; but it's one I firmly believe to be the case: the only true guarantor of your data's integrity and safety is you, your skills… and your care.

wiki/linux/digitaldata.txt · Last modified: 2019/03/16 10:03 by dizwell