Ubuntu – RAID5 recovery trying with “mdadm –create … missing”

encryptionlinuxmdadmUbuntu

First how I got in this situation:

I had an RAID5 array with disks each 2TB (external USB-disks), I then wanted to create a larger encrypted array. Therefore I got 2 additional disks (also 2 TB each) and the plan was to run the original array in degraded mode, set up the new encrypted array, copy part of the data, after that shrink the original array to 2 disks degraded mode, enlarge the new one, copy the rest of the data and finally enlarge it to 7 disks RAID5 non-degraded. I did the whole procedure with /dev/loopX devices 2GB each to test if my plan has any caveats.

Everything went well with the real array until to the point where one of the new disks failed. When I replaced this one, the order in which the disks are recognized by the kernel changed after the next reboot (/dev/sdb, /dev/sdc, … were all different disks than before). Everything got messed up and I didn't realize that until one of the disks got resynched as member of the wrong array. I spare the details of this story and get straight to the point:

I have now one encrypted array, 3-disk RAID5, degraded on /dev/sdc1 and /dev/sdd1, running perfectly fine, all data there and a healthy file system according to fsck -f.
So far so good.

The whole problem is now down to 3 disks – I can't get this non-encrypted array to work again. I am pretty sure the data HAS to be there on /dev/sdf1, /dev/sdg1, /dev/sdh1, as this was a working array just before ONE of the disks might got messed up (accidentally got resynched as member of the other encrypted array, as said before). So, one of these three disks may have incorrect array data, but which one? And two of them have to be good, but how do I figure that out?

I tried every permutation of mdadm --create … with /dev/sdf1, /dev/sdg1, /dev/sdh1 and "missing" like:

mdadm --create /dev/md0 --level=5 --raid-devices=3  /dev/sdf1 /dev/sdg1 missing

mdadm --create /dev/md0 --level=5 --raid-devices=3  /dev/sdf1 missing /dev/sdg1

...

and of course checked every time with

fsck /dev/md0

which complained about an invalid superblock.

Every time mdadm created the array, but there was no file system readable, it just contained garbage, none of the permutations used with mdadm finally worked.
So my question is now: What options do I have left? Besides lose my data and rebuild the array from scratch, of course.

Here some additional info (all the disks):

mdadm --examine /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : cfee26c0:414eee94:e470810c:17141589
           Name : merlin:0  (local to host merlin)
  Creation Time : Sun Oct 28 11:38:32 2012
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 3906760704 (1862.89 GiB 2000.26 GB)
     Array Size : 3906759680 (3725.78 GiB 4000.52 GB)
  Used Dev Size : 3906759680 (1862.89 GiB 2000.26 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : f4f0753e:56b8d6a5:84ec2ce8:dbc933f0

    Update Time : Sun Oct 28 11:38:32 2012
       Checksum : 60093b72 - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AA. ('A' == active, '.' == missing)


mdadm --examine /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 5cb45bae:7a4843ba:4ad7dbfb:5c129d2a
           Name : merlin:1  (local to host merlin)
  Creation Time : Wed Sep 26 07:32:32 2012
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 3906760704 (1862.89 GiB 2000.26 GB)
     Array Size : 3905299456 (3724.38 GiB 3999.03 GB)
  Used Dev Size : 3905299456 (1862.19 GiB 1999.51 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 9e2f9ae6:6c95d05e:8d83970b:f1308de0

    Update Time : Fri Oct 26 03:26:37 2012
       Checksum : 79d4964b - correct
         Events : 220

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AA. ('A' == active, '.' == missing)


mdadm --examine /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 5cb45bae:7a4843ba:4ad7dbfb:5c129d2a
           Name : merlin:1  (local to host merlin)
  Creation Time : Wed Sep 26 07:32:32 2012
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 3906760704 (1862.89 GiB 2000.26 GB)
     Array Size : 3905299456 (3724.38 GiB 3999.03 GB)
  Used Dev Size : 3905299456 (1862.19 GiB 1999.51 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 98b07c41:ff4bea98:2a765a6b:63d820e0

    Update Time : Fri Oct 26 03:26:37 2012
       Checksum : 6e2767e8 - correct
         Events : 220

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AA. ('A' == active, '.' == missing)


mdadm --examine /dev/sde1
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6db9959d:3cdd4bc3:32a241ad:a9f37a0c
           Name : merlin:0  (local to host merlin)
  Creation Time : Sun Oct 28 12:12:59 2012
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 3905299943 (1862.19 GiB 1999.51 GB)
     Array Size : 3905299456 (3724.38 GiB 3999.03 GB)
  Used Dev Size : 3905299456 (1862.19 GiB 1999.51 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 677a4410:8931e239:2c789f83:e130e6f7

    Update Time : Sun Oct 28 12:12:59 2012
       Checksum : 98cb1950 - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : A.A ('A' == active, '.' == missing)


mdadm --examine /dev/sdf1
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 3700a0a6:3fadfd73:bc74b618:a5526767
           Name : merlin:0  (local to host merlin)
  Creation Time : Sun Oct 28 11:28:30 2012
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 3905392640 (1862.24 GiB 1999.56 GB)
     Array Size : 3905391616 (3724.47 GiB 3999.12 GB)
  Used Dev Size : 3905391616 (1862.24 GiB 1999.56 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 5a8a5423:10b7a542:26b5e2b3:f0887121

    Update Time : Sun Oct 28 11:28:30 2012
       Checksum : 8e90495f - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AA. ('A' == active, '.' == missing)


mdadm --examine /dev/sdg1
/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 202255c9:786f474d:ba928527:68425dd6
           Name : merlin:0  (local to host merlin)
  Creation Time : Sun Oct 28 11:24:36 2012
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 3905299943 (1862.19 GiB 1999.51 GB)
     Array Size : 3905299456 (3724.38 GiB 3999.03 GB)
  Used Dev Size : 3905299456 (1862.19 GiB 1999.51 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 4605c729:c290febb:92901971:9a3ed814

    Update Time : Sun Oct 28 11:24:36 2012
       Checksum : 38ba4d0a - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AA. ('A' == active, '.' == missing)


mdadm --examine /dev/sdh1
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 682356f5:da2c442e:7bfc85f7:53aa9ea7
           Name : merlin:0  (local to host merlin)
  Creation Time : Sun Oct 28 12:13:44 2012
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 3906761858 (1862.89 GiB 2000.26 GB)
     Array Size : 3906760704 (3725.78 GiB 4000.52 GB)
  Used Dev Size : 3906760704 (1862.89 GiB 2000.26 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 489943b3:d5e35022:f52c917a:9ca6ff2a

    Update Time : Sun Oct 28 12:13:44 2012
       Checksum : f6947a7d - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AA. ('A' == active, '.' == missing)



cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid5 sdc1[0] sdd1[1]
      3905299456 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]

unused devices: <none>

Any help would be greatly appreciated!

Best Answer

If you just lost one disk, you should have been able to recover from that using the very much safer --assemble.

You've run create now so much that all the UUIDs are different. sdc1 and sdd1 share a UUID (expected, as that's your working array)... the rest the disks share a name, but all have different UUIDs. So I'm guessing none of those are the original superblocks. Too bad...

Anyway, I'd guess you're either attempting to use the wrong disks, or you're trying to use the wrong chunk size (the default has changed over time, I believe). Your old array may have also used a different superblock version—that default has definitely changed—which could offset all the sectors (and also destroy some of the data). Finally, its possible you're using the wrong layout, though that's less likely.

It's also possible that, your test array was read-write (from a md standpoint) that attempts to use ext3 actually did some writes. E.g., a journal replay. But that's only if it found a superblock at some point, I think.

BTW: I think you really ought to be using --assume-clean, though of course a degraded array will not try to start rebuilding. Then you probably want to immediately set read-only.

Related Question