filesystems embedded data-recovery – Best Filesystem for Securing Data Against Corruption Due to Power Loss

data-recoveryembeddedfilesystems

I am running a small uClibc and busybox based embedded system on an x86 device. I am using an initramfs but I also mounting a custom ext3 directory on a compact flash device in IDE mode that I am using to store persistent measurement logging data created by a custom written c++ application. I chose the ext3 file system as it is recommended for safety against power loss when using CF drives in IDE mode in a couple of books I have read (Building Embedded Linux Systems by Karim Yaghmour and Embedded Linux Primer by Christopher Hallinan). This is particularly important and the data is critical.

However, due to some of the comments in my previous question Confusion with how to restore corrupt ext3 files if power outage occurs during a file write it would appear that in fact this file system does not offer the guarantee of safety against data corruption due to power loss. So I would like to know if

  1. Is ext3 actually the best choice for this setup?
  2. Does power loss during a disc write operation only corrupt the portion of data I am appending to the file periodically or can it corrupt the entire file?
  3. Is data that is not being written at the point of power loss completely safe? In particular, is there any risk that my initramfs.cpio file can become corrupt also?
  4. Is there any method I can use in my application code to protect the data (i.e. creating an extra partition and writing my data to mirror images so that there are always 2 copies) – speed is not a real issue for my application so expensive copying operations are acceptable.

I have seen and read the answers to this related question: Do journaling filesystems guarantee against corruption after a power failure?, but it doesn't quite cover some of the things that are confusing me.

I realise that I am asking a lot of questions but it seems that despite reading a lot of material I have had a fundamental failure to understand the risks to my data in the event of power loss.

Best Answer

As with all things pertaining to security, there aren't any guarantees, but you also need to balance risk (and cost) against probability. From experience (and I've been running dozens of *nix boxen since the dark ages), I've never really had significant power-caused filesystem corruption.

Some of these machines were even running on non-journalled filesystems (ufs and ext2 usually). Some of them were embedded, and a few were mobile phones like the Nokia N900 — so a good power supply wasn't at all guaranteed.

It's not that filesystem corruption can't happen, it's just that the probability of it happening is low enough that it shouldn't worry you. Still, no reason not to hedge your bets.

In answer to your literal questions:

  1. At least the first book you referenced was written before ext4 — when the author suggests using ext3, they're really saying ‘don't use unstable or non-journalled filesystems like ext2’). Try ext4, it's quite mature, and has some decent options for non-spinning disks which may extend the life expectancy of your flash device.
  2. Chances are it would lose you the last block or two, not the entire file. With a journalled filesystem, this will be about the only loss. There are failure scenarios where I could see random data sprayed across the file, but they seem about as likely as a micrometeorite smashing right through your embedded device.
  3. See 2. Nothing is 100.00% safe.
  4. If you have a second IDE channel, stick a second CF card in there and grab a backup of the filesystem periodically. There are a few ways to do this: rsync, cp dump, dd, even using the md(4) (software RAID) device (you add the second drive occasionally, let it sync, then remove it — if both devices are live all the time, they run the same risk of filesystem corruption). If you use LVM, you can even grab snapshots. For a data collection embedded device, I'd just use am ad hoc solution which mounts the second filesystem, copies over the data log, the immediately unmounts it. If you're worried about the device having a good boot image, stick a second copy of the boot manager and all necessary boot images on the second device and configure the computer to boot from either CF card.

    I wouldn't trust a second copy on the same device because storage devices fail more often than stable filesystems. Much more often, in my experience so far (at work, there was a bitter half-joke about the uncannily high chances of Friday afternoon disk failures. It was almost a weekly event for a while). Whether the disk is spinning or not, it can fail. So keep your eggs in two baskets if you can, and you'll protect your data better.

    If the data is particularly sensitive, I'd pay regular visits to the device, swap the backup CF for a fresh one and reboot, letting it fsck all its filesystems for good measure.