As with all things pertaining to security, there aren't any guarantees, but you also need to balance risk (and cost) against probability. From experience (and I've been running dozens of *nix boxen since the dark ages), I've never really had significant power-caused filesystem corruption.
Some of these machines were even running on non-journalled filesystems (ufs and ext2 usually). Some of them were embedded, and a few were mobile phones like the Nokia N900 — so a good power supply wasn't at all guaranteed.
It's not that filesystem corruption can't happen, it's just that the probability of it happening is low enough that it shouldn't worry you. Still, no reason not to hedge your bets.
In answer to your literal questions:
- At least the first book you referenced was written before
ext4
— when the author suggests using ext3
, they're really saying ‘don't use unstable or non-journalled filesystems like ext2
’). Try ext4
, it's quite mature, and has some decent options for non-spinning disks which may extend the life expectancy of your flash device.
- Chances are it would lose you the last block or two, not the entire file. With a journalled filesystem, this will be about the only loss. There are failure scenarios where I could see random data sprayed across the file, but they seem about as likely as a micrometeorite smashing right through your embedded device.
- See 2. Nothing is 100.00% safe.
If you have a second IDE channel, stick a second CF card in there and grab a backup of the filesystem periodically. There are a few ways to do this: rsync
, cp
dump
, dd
, even using the md(4)
(software RAID) device (you add the second drive occasionally, let it sync, then remove it — if both devices are live all the time, they run the same risk of filesystem corruption). If you use LVM, you can even grab snapshots. For a data collection embedded device, I'd just use am ad hoc solution which mounts the second filesystem, copies over the data log, the immediately unmounts it. If you're worried about the device having a good boot image, stick a second copy of the boot manager and all necessary boot images on the second device and configure the computer to boot from either CF card.
I wouldn't trust a second copy on the same device because storage devices fail more often than stable filesystems. Much more often, in my experience so far (at work, there was a bitter half-joke about the uncannily high chances of Friday afternoon disk failures. It was almost a weekly event for a while). Whether the disk is spinning or not, it can fail. So keep your eggs in two baskets if you can, and you'll protect your data better.
If the data is particularly sensitive, I'd pay regular visits to the device, swap the backup CF for a fresh one and reboot, letting it fsck
all its filesystems for good measure.
All three data journaling modes should leave the filesystem itself fully intact after a power failure. So it should always mount without errors. The difference is only in the data in your files; data=writeback
mode may leave stale data (i.e., what was stored in the disk sectors before the writes your app did). data=ordered
and data=journaled
should not do this.
Most likely what you're seeing is that I/O barriers aren't working on your setup. First, make sure you're not mounting with barrier=0
/nobarrier
. That boosts performance, but will cause corruption on power failure.
If I/O barriers are on, it's also possible you're passing through a storage layer that doesn't support them. On older releases, LVM didn't and various mdraid levels didn't. (This was fixed in Linux 2.6.33; so only if you're running Lucid still.)
Finally, it's possible your disks are telling lies. Disks have write caches. Especially with NCQ, they're supposed to only tell the OS they've written data when they've actually done so, but they've been known to tell the OS its written when its only in the disk's write cache. Increases performance. At least as long as the power stays on. You can try disabling the write cache on the disks, though you'll take a performance hit for this.
Note also that flash-memory disks have a lot of work to do under the hood, and many of them don't handle power failure well. (For example, wear leveling sometimes requires that a full flash block of data be moved. If the power fails in the middle, bad things happen on some flash disks.)
Finally... have you considered an UPS?
Best Answer
There are no guarantees. A Journaling File System is more resilient and is less prone to corruption, but not immune.
All a journal is is a list of operations which have recently been done to the file system. The crucial part is that the journal entry is made before the operations take place. Most operations have multiple steps. Deleting a file, for example might entail deleting the file's entry in the file system's table of contents and then marking the sectors on the drive as free. If something happens between the two steps, a journaled file system can tell immediately and perform the necessary clean up to keep everything consistent. This is not the case with a non-journaled file system which has to look at the entire contents of the volume to find errors.
While this journaling is much less prone to corruption than not journaling, corruption can still occur. For example, if the hard drive is mechanically malfunctioning or if writes to the journal itself are failing or interrupted.
The basic premise of journaling is that writing a journal entry is much quicker, usually, than the actual transaction it describes will be. So, the period between the OS ordering a (journal) write and the hard drive fulfilling it is much shorter than for a normal write: a narrower window for things to go wrong in, but there's still a window.
Further reading