Ubuntu – ext4 filesystems frequently corrupting

ext4filesystemsjournalingUbuntu

We have occasional power outages in our environment which seems to cause data corruption on our Ubuntu machines with ext4 filesystems.

To my understanding ext4's default is to use
data=ordered

Which is described as "All data are forced directly out to the main file system prior to its metadata being committed to the journal."

Does this mean that if there is a power outage, and the operation to write to disk is interrupted that there can be filesystem corruption?

If I want to completely eliminate filesystem corruption due to power outages I'd guess I would use data=journaled, are there any negative impacts to this other than a performance hit?

Bonus: How do I change the journaling type on my filesystem from data=ordered to another. I'm guessing I'd need to make modifications to the journal but I'm not quite sure how or in what order to perform these operations.

It's just getting really annoying that Ubuntu (initramfs) doesn't have any filesystem recovery utilities so any way we can get to prevent us from having to pop in a live cd is great.

My /etc/fstab

# /etc/fstab: static file system information.
#
# Use 'blkid -o value -s UUID' to print the universally unique identifier
# for a device; this may be used with UUID= as a more robust way to name
# devices that works even if disks are added and removed. See fstab(5).
#
#                
proc            /proc           proc    defaults        0       0
# / was on /dev/sda1 during installation
UUID=9cd71f51-53bb-44c7-affa-14293e59d596 /               ext4    errors=remount-ro 0       1
# swap was on /dev/sda5 during installation
UUID=5568cee1-a50b-4409-ad67-cdc5bfb592a3 none            swap    sw              0       0
/dev/scd0       /media/cdrom0   udf,iso9660 user,noauto,exec,utf8 0       0

OS version

-bash-4.0# uname -a
Linux LG-F3-19 2.6.31-14-server #48-Ubuntu SMP Fri Oct 16 15:07:34 UTC 2009 x86_64 GNU/Linux
-bash-4.0# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 9.10
Release:        9.10
Codename:       karmic

Image of failure: http://imgur.com/odo4iBY

References:
https://www.kernel.org/doc/Documentation/filesystems/ext4.txt
http://www.ibm.com/developerworks/library/l-journaling-filesystems/

Best Answer

All three data journaling modes should leave the filesystem itself fully intact after a power failure. So it should always mount without errors. The difference is only in the data in your files; data=writeback mode may leave stale data (i.e., what was stored in the disk sectors before the writes your app did). data=ordered and data=journaled should not do this.

Most likely what you're seeing is that I/O barriers aren't working on your setup. First, make sure you're not mounting with barrier=0/nobarrier. That boosts performance, but will cause corruption on power failure.

If I/O barriers are on, it's also possible you're passing through a storage layer that doesn't support them. On older releases, LVM didn't and various mdraid levels didn't. (This was fixed in Linux 2.6.33; so only if you're running Lucid still.)

Finally, it's possible your disks are telling lies. Disks have write caches. Especially with NCQ, they're supposed to only tell the OS they've written data when they've actually done so, but they've been known to tell the OS its written when its only in the disk's write cache. Increases performance. At least as long as the power stays on. You can try disabling the write cache on the disks, though you'll take a performance hit for this.

Note also that flash-memory disks have a lot of work to do under the hood, and many of them don't handle power failure well. (For example, wear leveling sometimes requires that a full flash block of data be moved. If the power fails in the middle, bad things happen on some flash disks.)

Finally... have you considered an UPS?

Related Question