This week, it finally happened. I think it’s the first time in 20 years that a hard drive has died on me without warning. And it was also the first time I was using an NVMe drive, but that could be a coincidence.
The drive was still under warranty (barely a year and a half old). I even had a spare lying around. But the true cost of restoration is, of course, my own labor. My planning had not been perfect (for such a remote event, as I had judged). However, it was easy enough. I simply installed NixOS from a USB loader and downloaded my configuration from my backup on my NAS (daily rsync jobs to the rescue). I also downloaded all the important files for my home directory. Then, it was simply a matter of adjusting a few things in the configuration file, rebuilding the system, and voilà. Well, except for a few things that didn’t work quite right for some reason and had to be manually fixed, but nothing major.
However, next time I want this to be even easier. It’s probably overkill to install a RAID controller and have multiple drives running in RAID1 or RAID5, but the restoration process is still too much manual work. I was thinking of regularly backing up my main drive on the block device level, so I would just have to swap out the drive and restore the delta from the backup. I’m not quite sure if that’s feasible or a good idea. For my personal system, I have to balance the investment of preparing for a disaster with the likelihood and impact of such an event. This seems like a good trade-off, but I would be curious to hear how other people prepare for drive failure.
RAID is not backup. RAID is to keep running until you can replace a drive with the spare on hand.
Software RAID is totally fine if that’s what you want to run, no need for a raid controller.
Personally, I don’t bother backing up my desktop systems, I only back up my server VMs. You might want to look at using Ansible to automate system deployment. I use Windows on my desktop, so I use group policy to configure a bunch of settings to my taste.
I have successfully recovered from dead drives by restoring from a borgbackup to a fresh new drive.
Borg backups take much less space on the backup storage because of extremely efficient compression and deduplication.
The Professor who developed it has some presentations on youtube and it’s kind of mindblowing.
So thats what I would recommend.
I backup all my computers and servers with borgmatic which makes it a bit easier to manage excluding directories and how many versions you’d like to keep.
If you need any help with setting it up, let me know.
Thanks, that looks interesting! I wonder how that compares to something like btrfs snapshots. How easy is it to restore a whole disk as opposed to files and directories?
You don’t need a RAID controller, I have dual NVME set up with RAID1 and boot off the RAID one partition, the only partition I can’t raid is the EFI partition because BIOS doesn’t know about it, but that I simply duplicate by hand on both drives using dd, since it only gets updated at kernel updates, it just adds a dd to the kernel upgrade process.
Yeah, I assume you don’t. But with my mainboard I would take a big performance hit, since it does not offer full speed if you occupy both M.2 slots. How do you manage the RAID by the way? Is that all handled by the BIOS?