Linux – Ext3 drive will not mount after power failure; how to recover files

data-recoveryfscklinuxtroubleshootingUbuntu

After a recent power failure which caused my linux box (Ubuntu 8.10) to rapidly poweroff twice from a normal running state, I have a drive that will not mount.

UPDATE: The drive will sometimes mount, but show up as completely empty (not even Lost+Found) and show 14.9 GB free (it is a 500GB drive) When I try to do anything it gives me a permission error and the drive unmounts. (or, perhaps, was not really mounted in the first place?)

Here's the error message when I try to mount:

~$ sudo mount -a
mount: wrong fs type, bad option, bad superblock on /dev/sdd1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

So maybe specify the fs type?

~$ sudo mount -t ext3 /dev/sdd1 /media/disk-7
mount: wrong fs type, bad option, bad superblock on /dev/sdd1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

No, same. So something is messed up?

~$ sudo fsck /dev/sdd1
fsck 1.41.3 (12-Oct-2008)
e2fsck 1.41.3 (12-Oct-2008)
/dev/sdd1: recovering journal
fsck.ext3: No such file or directory while trying to re-open /dev/sdd1
Warning... fsck.ext3 for device /dev/sdd1 exited with signal 11.

Googling for signal 11 wasn't encouraging, but I found a few other ways to try to repair the disk:

~$ sudo e2fsck /dev/sdd1
e2fsck 1.41.3 (12-Oct-2008)
/dev/sdd1: recovering journal
e2fsck: No such file or directory while trying to open /dev/sdd1

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 [device] 

Still hoping this failure has something to do with the power outage, I assume the superblock is corrupt or something, and try another: (I first determine that my block size is 32k using makefs -n)

~$ sudo e2fsck -b 32768 /dev/sdd1
e2fsck 1.41.3 (12-Oct-2008)
ext3 recovery flag is clear, but journal has data.
Recovery flag not set in backup superblock, so running journal anyway.
/dev/sdd1: recovering journal
e2fsck: Journal must be at least 1024 blocks while recovering 
ext3 journal of /dev/sdd1

Per Avery Payne below I tried the following:

sudo mount -t ext2 -o ro /dev/sdd1 /media/disk-7

But got this error message:

mount: wrong fs type, bad option, bad superblock on /dev/sdd1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so
~$ dmesg | tail
[261157.639721] EXT2-fs: sdd1: couldn't mount because of unsupported optional features (4).

And that's about where I'm stuck. I tried every backup superblock listed and get the same result. If it helps any, the "recovering journal" step takes a long time before it moves on to tell me it isn't working.

Honestly, I don't care much about getting back the state of the drive minutes before the crash, just about recovering the 400+ GB of other data that is on it. If anyone knows anything else I can try, ext3 data recovery utilities or techniques, etc, I would greatly appreciate it!

Best Answer

The problems you're having sound far more extensive than what I'd expect from mere loss of power (even during fairly heavy write activity) on a device. I have to wonder if you're really having more problems at the interface/driver level, or a corrupted partition table or something of that sort.

From the sounds of things you may have exacerbated the problem further with all the thrashing around you've done while trying to fix the issue.

I don't know if we can help with this case but don't give up yet.

For the future I'd suggest that you learn the following technique:

When you have trouble with a drive under Linux or UNIX you can usually use dd to make a bit-image copy of the whole device to some other location. Find a drive that's at least as large as the one in question and try a command like: dd if=$PROBLEMATIC of=$TARGET bs=4M ... be very careful about the if (input file) and of (output file) directives. Leave that run. It's a good idea to run tail -f /var/log/messages & (or possible variant as appropriate to your /etc/syslog.conf) ... either do that in the background or in another window. There are enhanced versions of dd which can handle retries and continuing past bad blocks more robustly (sdd is a name that comes to mind). But try just using the stock GNU dd command at first.

You can make such a copy of the whole device (/dev/sdd, for example) or just the partition (/dev/sdd1). If you get "short read or similar errors then it suggests that either the device has physical errors preventing reads past certain cylinders or, in the case of a partition, that the partition table is mangled in some way. You can even make two different dd images ... one of each.

Here's the trick: do all your fsck and mount attempts, and use your various other recovery tools such as TCT (The Coroner's Toolkit) on the copied image!

This minimizes the time spent running the drive (which is possibly degrading at the hardware level as you operate it) and minimizes the impact of failed and possibly misguided recovery attempts. (In some situations you make one image, then another based on that and always operate on the tertiary image ... depends on how much the data is worth).

I personally suggest that you run something like hexdump or strings to read through the image ... just let it scroll past for a long time and look for plain text that looks like it might be fragments of your data. I have used grep to recover useful (textual) data from otherwise completely mangled filesystems. In case I'm not suggesting it as data recovery heroics ... but as a sanity check. If you scroll through 10s of megabytes or a few gigabytes of data and don't see any recognizable text ... then you probably have a hopeless case or you've done something very wrong (were you really careful about those if= and of= options?).

I don't know if any of this will help you with the current effort. But learn these tricks now and they will definitely make your next foray into data recovery much less scary. (Yes, practice on a healthy system once or twice --- go use a hex editor and try adding your own creative corruption here and there --- to the COPY of course! Then try fix it).

Oh, and this is a really good time to review your backup and data recovery plans and procedures (or provide better advice to your customer/colleague/client/friend/whatever).

Related Question