As with all things pertaining to security, there aren't any guarantees, but you also need to balance risk (and cost) against probability. From experience (and I've been running dozens of *nix boxen since the dark ages), I've never really had significant power-caused filesystem corruption.
Some of these machines were even running on non-journalled filesystems (ufs and ext2 usually). Some of them were embedded, and a few were mobile phones like the Nokia N900 — so a good power supply wasn't at all guaranteed.
It's not that filesystem corruption can't happen, it's just that the probability of it happening is low enough that it shouldn't worry you. Still, no reason not to hedge your bets.
In answer to your literal questions:
- At least the first book you referenced was written before
ext4
— when the author suggests using ext3
, they're really saying ‘don't use unstable or non-journalled filesystems like ext2
’). Try ext4
, it's quite mature, and has some decent options for non-spinning disks which may extend the life expectancy of your flash device.
- Chances are it would lose you the last block or two, not the entire file. With a journalled filesystem, this will be about the only loss. There are failure scenarios where I could see random data sprayed across the file, but they seem about as likely as a micrometeorite smashing right through your embedded device.
- See 2. Nothing is 100.00% safe.
If you have a second IDE channel, stick a second CF card in there and grab a backup of the filesystem periodically. There are a few ways to do this: rsync
, cp
dump
, dd
, even using the md(4)
(software RAID) device (you add the second drive occasionally, let it sync, then remove it — if both devices are live all the time, they run the same risk of filesystem corruption). If you use LVM, you can even grab snapshots. For a data collection embedded device, I'd just use am ad hoc solution which mounts the second filesystem, copies over the data log, the immediately unmounts it. If you're worried about the device having a good boot image, stick a second copy of the boot manager and all necessary boot images on the second device and configure the computer to boot from either CF card.
I wouldn't trust a second copy on the same device because storage devices fail more often than stable filesystems. Much more often, in my experience so far (at work, there was a bitter half-joke about the uncannily high chances of Friday afternoon disk failures. It was almost a weekly event for a while). Whether the disk is spinning or not, it can fail. So keep your eggs in two baskets if you can, and you'll protect your data better.
If the data is particularly sensitive, I'd pay regular visits to the device, swap the backup CF for a fresh one and reboot, letting it fsck
all its filesystems for good measure.
There's a bit of an inconsistency or at least, ambiguity, in your story here:
I'd still rather lose it all than facing an 'unable to mount', 'wait for this 10 minutes fsck'
Implies -- although you don't actually say it -- that this is a problem you are actually experiencing. But then:
e2fsprogs-libs (dependency to jfsutils) seems to be hellishly difficult to compile in my distribution.
Meaning you don't have any fsck at all, since e2fsprogs-libs
is a dependency for e2fsprogs
which provides e2fsck
. So perhaps you are still in a planning stage here and have not even tested the system with, e.g., ext4
, but instead jumped to the conclusion that you should start with JFS? Is there any particular reason for that?
I've noticed on the raspberry pi exchange (the pi's primary storage is also a SD card) that a significant number of users seem to be very frustrated by problems of this sort, even though the majority (including myself) have never had it at all. At first I assumed these were people ignorant of the fact that the system should be cleanly shut down, but that is not a hard point to grasp when explained, and there are people who report it even though the system HAS been shut down properly.
You've already said you need this to be able to tolerate power cuts (which is fair enough), but I mention this because it implies there are some pis, or some SD cards, or some combination of both, that are just prone to corrupting the filesystem due to some event (surge?) that occurs regularly either when the plug is pulled, or when it is put back in. I also have NOT seen -- and there's been plenty of time for plenty of people to try -- ANY reports of someone saying they've switched to btrfs or jfs or whatever and now the problem is solved.
The other mysterious thing about this is even if people are yanking the cord, this should not regularly result in an unusable filesystem. Certainly I've done it a bunch of times w/ the pi, and scores if not hundreds of times w/ a regular linux box (the power was cut, the system has become unresponsive, I'm exhausted and angry, etc.) and while I've seen minor data loss, I've never seen a filesystem corrupted to the point of being unusable after a quick fsck.
Again, presuming all these reports are true (I don't see why numbers of people would lie about it), there's something much more going on than just not cleanly unmounting, but it seems to only affect a small percentage of users, implying again some kind of common hardware defect.
On the pi I write -y
to /forcefsck
in a boot script, so that on the next boot it is run automatically and any problems are fixed, regardless of whether this appears to be necessary or not. On a 700 Mhz single core this takes ~10 seconds for a 12 GB filesystem containing ~4 GB of data. So "10 minutes" sounds like an incredibly long time, especially since you've already said "This is the small filesystem for write!".
You might also consider calling sync
at regular intervals.
Finally, you should update the question with more factual, specific details of the problems you have actually encountered, and less hyperbole. Otherwise it just looks too much like a premature XY problem, which will likely get quickly skipped over by people with a lot of experience and potential advice for you.
Best Answer
The main thing you need to do is issue a
sync
system call. There's async
utility that does just that. When thesync
system call returns, it guarantees that any filesystem write operation (on any mounted filesystem) that was issued before thesync
is completed.It's up to your application design to ensure that if this happens in the middle of a sequence of write operations, the data is left in a usable state. However, if you have the luxury of a guaranteed warning period before power loss, you can be sloppier in your application design, as long as you guarantee timely response to the power loss notice (which is hard).
With journaling filesystems such as ext4, if you sync and then turn off power, you won't get an fsck on reboot. However, if something causes a write after the sync, I think it's possible, but rare, that fsck could be needed. If you want to be absolutely sure, unmount all read-write filesystems before the power loss, or at least remount them as read-only. Normally, you can't do that if there are files open for writing. If your system runs Linux, you can use its magic sysrq feature (you'll need to make sure that it's enabled). This can be invoked programmatically by writing a character to
/proc/sysrq-trigger
:echo u >/proc/sysrq-trigger
force-remounts all filesystems as read-only (this includes the effect of sync). You can also use this interface to reboot (b
) or power off (o
), if that's useful in your setup.If the power loss notice might be cancelled, you can call
sync
: that has no ill effect on anything but performance. A force-mount-read-only, on the other hand, is not recoverable in general, so do that only when you've committed to rebooting.For most setups that match your description, this is a reasonable reaction to a power loss notice:
echo u >/proc/sysrq-trigger