Linux Data Recovery – Recover Data from RAID and Disk Failure

data-recoveryhard-diskmdadmraid

Some background

A friend of mine was using in his office a NAS Buffalo-LS-WVL with two disks of 1TB each. It seems that the two disks were mounted as raid 1, but, as you will read, probably they have been not. The NAS gave some problems of extremely slowness and then suddenly didn't work anymore. I've been called in to rescue his data.
Both disk have exactly the same partitioning: one physical and 6 logical, and in the 6th data lives in (aprox 80GB out of 0,95TB).

Disk /dev/sdd seems to give hardware problems (slowliness, sector reading error, etc.), whereas /dev/sde is a well-physically functioning disk.

The goal is to extract data that were contained in the NAS. If not all data, the most to be extracted, the better. These data are vital for the company of this friend of mine.

What I have tried already

  1. 1st attempt: Mounting disks alone

    This is the very first try, hoping it works, I've tried to get each disk and mount it alone, and I got this message:

    root@ubuntu:~# mount /dev/sdd6 /mnt/n
    
    -or-
    
    root@ubuntu:~# mount /dev/sde6 /mnt/n
    

    both gave me the same message:

    mount: unknown filesystem type 'linux_raid_member'
    
  2. 2nd attempt: Creating disk array RAID 1 and try to mount them

    OK, if I cannot mount them alone, then I need to create an array of disks. Let's suppose (the most logical) the original configuration was raid 1, and use one disk at a time:

    root@ubuntu:~# mdadm --create --run --level=1 --raid-devices=2 \
                      /dev/md/md-singolo-e6--create-missing /dev/sde6 missing
    

    gives:

    mdadm: /dev/sde6 appears to be part of a raid array:    
    level=raid0    
    devices=2    
    ctime=Mon Sep 26 10:23:48 2011    
    mdadm: Note: this array has metadata at the start and may not be suitable as a boot device.  If you plan to store '/boot' on this device please ensure that    your boot-loader understands md/v1.x metadata, or use    --metadata=0.90    
    mdadm: Defaulting to version 1.2 metadata    
    mdadm: array /dev/md/md-singolo-e6--create-missing started.
    

    So, it seems that the original raid was in 0 mode and not 1 mode. Bad new, as a disk is giving sector problems.

    Anyway, I gave a try to mount the newly created RAID1 array (even if I know it's no-sense):

    root@ubuntu:~# mkdir /mnt/md-singolo-e6--create-missing    
    root@ubuntu:~# mount /dev/md/md-singolo-e6--create-missing \
                     /mnt/md-singolo-a6--create-missing/
    

    gave:

    mount: /dev/md127: can't read superblock
    

    Exactly the same result has been given for the other disk.

  3. 3rd attempt: Creating disk array RAID 0 and try to mount them

    OK, as it has been stated that it was Raid0, let's go for it:

    root@ubuntu:~# mdadm --create --run --level=0 --raid-devices=2 \
                       /dev/md/md001hw /dev/sdd6 /dev/sde6 
    

    gives:

    mdadm: /dev/sdd6 appears to be part of a raid array:
    level=raid1
    devices=2
    ctime=Mon Oct 14 16:38:33 2013
    mdadm: /dev/sde6 appears to be part of a raid array:
    level=raid1
    devices=2
    ctime=Mon Oct 14 17:01:01 2013
    mdadm: Defaulting to version 1.2 metadata
    mdadm: array /dev/md/md001hw started.
    

    OK, once created I try to mount it:

    root@ubuntu:~# mount /dev/md/md001hw /mnt/n
    
    mount: you must specify the filesystem type
    

    At this point all ext2,3,4 specified with -t gave error.

  4. 4th attempt: Creating disk images and work with them

    OK, as a disk has problem it is much better to work on a copy (dd) of the data partition, padded with 0 (sync) in case of block read error (error). I therefore created the two images:

    This one for the good disk (block of 4MB, to be faster):

    root@ubuntu:~# dd bs=4M if=/dev/sde6 of=/media/pietro/4TBexthdd/sde6-bs4M-noerror-sync.img conv=noerror,sync
    

    and this one for the disk with problems (minimum block size, to be safer)

    root@ubuntu:~# dd if=/dev/sde6 of=/media/pietro/4TBexthdd/sdd6-noerror-sync.img conv=noerror,sync
    

    Once I got the two images I've tried to use them as RAID 0, with the command specified above. Nothing to do, the answer that came is that the images "is not a block device" and it does not create the array.

  5. 5th attempt: going byte-a-byte to rescue some data

    OK, if a proper mounting is not working, let's go to extract data trough byte-a-byte reading and header and footer info. I used *foremost*to do this job, both on each single disk: for disk 1:

    root@ubuntu:~# foremost -i /dev/sde6 -o /media/pietro/4TBexthdd/foremost_da_sde6/
    

    it creates sub-folders with file extensions, but no population at all in them. Whereas for disk 2 (the damaged one):

    root@ubuntu:~# foremost -i /dev/sdd6 -o /media/pietro/4TBexthdd/foremost_da_sdd6_disco2/
    

    neither the sub-folder structure is created by foremost.

    Same result when I tried foremost on RAID 0 array:

    root@ubuntu:~# foremost -i /dev/md/md001hw -o /media/pietro/4TBexthdd/foremost_da_raid_hw/
    

    Neither sub-folder structure has been created.

Where I need some help / My Questions

  • First and foremost question: how to rescue data? Does anyone has any hint I've not tried?
  • Could anyone of you suggest anything different of what I've done?

Other questions:

  • I'm new to mdadm, did I do everything correctly?
  • Was effectively the original array created on Sept 26th, 2011 in Raid 0 mode?
  • Why I cannot use the partition images to create an array?

Appendix

This is the output of dmesg in case of reading from the failing disk (/dev/sdd):

[  958.802966] sd 8:0:0:0: [sdd] Unhandled sense code
[  958.802976] sd 8:0:0:0: [sdd]  
[  958.802980] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  958.802984] sd 8:0:0:0: [sdd]  
[  958.802987] Sense Key : Medium Error [current] 
[  958.802994] sd 8:0:0:0: [sdd]  
[  958.802999] Add. Sense: Unrecovered read error
[  958.803003] sd 8:0:0:0: [sdd] CDB: 
[  958.803006] Read(10): 28 00 00 d5 c7 e0 00 00 f0 00
[  958.803021] end_request: critical target error, dev sdd, sector 14010336
[  958.803028] quiet_error: 36 callbacks suppressed
[  958.803032] Buffer I/O error on device sdd, logical block 1751292
[  958.803043] Buffer I/O error on device sdd, logical block 1751293
[  958.803048] Buffer I/O error on device sdd, logical block 1751294
[  958.803052] Buffer I/O error on device sdd, logical block 1751295
[  958.803057] Buffer I/O error on device sdd, logical block 1751296
[  958.803061] Buffer I/O error on device sdd, logical block 1751297
[  958.803065] Buffer I/O error on device sdd, logical block 1751298
[  958.803069] Buffer I/O error on device sdd, logical block 1751299
[  958.803074] Buffer I/O error on device sdd, logical block 1751300
[  958.803078] Buffer I/O error on device sdd, logical block 1751301
[  961.621228] sd 8:0:0:0: [sdd] Unhandled sense code
[  961.621236] sd 8:0:0:0: [sdd]  
[  961.621238] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  961.621241] sd 8:0:0:0: [sdd]  
[  961.621243] Sense Key : Medium Error [current] 
[  961.621248] sd 8:0:0:0: [sdd]  
[  961.621251] Add. Sense: Unrecovered read error
[  961.621254] sd 8:0:0:0: [sdd] CDB: 
[  961.621255] Read(10): 28 00 00 d5 c8 d0 00 00 10 00
[  961.621266] end_request: critical target error, dev sdd, sector 14010576
[  964.791077] sd 8:0:0:0: [sdd] Unhandled sense code
[  964.791084] sd 8:0:0:0: [sdd]  
[  964.791087] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  964.791090] sd 8:0:0:0: [sdd]  
[  964.791092] Sense Key : Medium Error [current] 
[  964.791096] sd 8:0:0:0: [sdd]  
[  964.791099] Add. Sense: Unrecovered read error
[  964.791102] sd 8:0:0:0: [sdd] CDB: 
[  964.791104] Read(10): 28 00 00 d5 c8 00 00 00 08 00
[  964.791114] end_request: critical target error, dev sdd, sector 14010368
[  964.791119] quiet_error: 22 callbacks suppressed
[  964.791122] Buffer I/O error on device sdd, logical block 1751296

Best Answer

I hate to be the bearer of bad news, but...

Q: I'm new to mdadm, did I do everything correctly?

A: No. In fact, you did just about everything in the most destructive way possible. You used --create to destroy the array metadata, instead of using --assemble which probably would have allowed you to read the data (at least, to the extent the disk is capable of doing so). In doing so, you have lost critical metadata (in particular, the disk order, data offset, and chunk size).

In addition, --create may have scribbled array metadata on top of critical filesystem structures.

Finally, in your step (3), I see that mdadm is complaining of RAID1 on both disks—I'm hoping that's from you trying (2) on both disks, individually. I sincerely hope you didn't let RAID1 start trying to sync the disks (say, had you added both to the same RAID1 array).

What to do now

It seems like you've finally created images of the drives. You ought to have done this first, at least before trying anything beyond a basic --assemble. But anyway,

  • If the image of the bad drive missed most/all sectors, determine if professional data recovery is worthwhile. Files (and filesystem metadata) are split across drives in RAID0, so you really need both to recover. Professional recovery will probably be able to read the drive.

  • If the image is mostly OK, except for a few sectors, continue.

Make a copy of the image files. Only work on the copies of the image files. I can not emphasize this enough, you will likely be destroying these copies several times, you need to be able to start over. And you don't want to have to image the disks again, especially since one is failing!

To answer one of your other questions:

Q: Why I cannot use the partition images to create an array?

A: To assemble (or create) an array of image files, you need to use a loopback device. You attach an image to a loopback device using losetup. Read the manpage, but it'll be something along the lines of losetup --show -f /path/to/COPY-of-image. Now, you use mdadm on the loop devices (e.g., /dev/loop0).

Determine the original array layout

You need to find out all the mdadm options that were originally used to create the array (since you destroyed that metadata with --create earlier). You then get to run --create on the two loopback devices, with those options, exactly. You need to figure out the metadata version (-e), the RAID level (-l, appears to be 0), the chunk size (-c), number of devices (-n, should be 2) and the exact order of the devices.

The easiest way to get this is going to be to get two new disks, put then in the NAS, and have the NAS create a new array on them. Preferably with the same NAS firmware version as originally used. IOW, repeat the initial set up. Then pull the disks out, and use mdadm -E on one of the members. Here is an example from a RAID10 array, so slightly different. I've omitted a bunch of lines to highlight the ones you need:

        Version : 1.0                 # -e
     Raid Level : raid10              # -l
   Raid Devices : 4                   # -n

     Chunk Size : 512K                # -c

   Device Role : Active device 0                         # gets you the device order
   Array State : AAAA ('A' == active, '.' == missing)

NOTE: I'm going to assume you're using ext2/3/4 here; if not, use the appropriate utilities for the filesystem the NAS actually used.

Attempt a create (on the loopback devices) with those options. See if e2fsck -n even recognizes it. If not, stop the array, and create it again with the devices in the other order. Try e2fsck -n again.

If neither work, you should go back to the order you think is right, and try a backup superblock. The e2fsck manpage tells you what number to use; you almost certainly have a 4K blocksize. If none of the backup superblocks work, stop the array, and try the other disk order. If that doesn't work, you probably have the wrong --create options; start over with new copy of the images & try some different options—I'd try different metadata versions first.

Once you get e2fsck to run, see how badly damaged the filesystem is. If its completely trashed, that may mean you have the wrong chunk size (stop and re-create the array to try some more).

Copy the data off.

I suggest letting e2fsck try to fix the filesystem. This does risk destroying the filesystem, but, well, that's why you're working on copies! Then you can mount it, and copy the data off. Keep in mind that some of the data is likely corrupted, and that corruption may be hidden (e.g., a page of a document could have been replaced with NULLs).

I can't get the original parameters from the NAS

Then you're in trouble. Your other option is to take guesses until one finally works, or to learn enough about the on-disk formats to figure it out using a hex editor. There may be a utility or two out there to help with this; I don't know.

Alternatively, hire a data recovery firm.