First how I got in this situation:
I had an RAID5 array with disks each 2TB (external USB-disks), I then wanted to create a larger encrypted array. Therefore I got 2 additional disks (also 2 TB each) and the plan was to run the original array in degraded mode, set up the new encrypted array, copy part of the data, after that shrink the original array to 2 disks degraded mode, enlarge the new one, copy the rest of the data and finally enlarge it to 7 disks RAID5 non-degraded. I did the whole procedure with /dev/loopX
devices 2GB each to test if my plan has any caveats.
Everything went well with the real array until to the point where one of the new disks failed. When I replaced this one, the order in which the disks are recognized by the kernel changed after the next reboot (/dev/sdb
, /dev/sdc
, … were all different disks than before). Everything got messed up and I didn't realize that until one of the disks got resynched as member of the wrong array. I spare the details of this story and get straight to the point:
I have now one encrypted array, 3-disk RAID5, degraded on /dev/sdc1
and /dev/sdd1
, running perfectly fine, all data there and a healthy file system according to fsck -f
.
So far so good.
The whole problem is now down to 3 disks – I can't get this non-encrypted array to work again. I am pretty sure the data HAS to be there on /dev/sdf1
, /dev/sdg1
, /dev/sdh1
, as this was a working array just before ONE of the disks might got messed up (accidentally got resynched as member of the other encrypted array, as said before). So, one of these three disks may have incorrect array data, but which one? And two of them have to be good, but how do I figure that out?
I tried every permutation of mdadm --create
… with /dev/sdf1
, /dev/sdg1
, /dev/sdh1
and "missing" like:
mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdf1 /dev/sdg1 missing
mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdf1 missing /dev/sdg1
...
and of course checked every time with
fsck /dev/md0
which complained about an invalid superblock.
Every time mdadm created the array, but there was no file system readable, it just contained garbage, none of the permutations used with mdadm finally worked.
So my question is now: What options do I have left? Besides lose my data and rebuild the array from scratch, of course.
Here some additional info (all the disks):
mdadm --examine /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : cfee26c0:414eee94:e470810c:17141589
Name : merlin:0 (local to host merlin)
Creation Time : Sun Oct 28 11:38:32 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 3906760704 (1862.89 GiB 2000.26 GB)
Array Size : 3906759680 (3725.78 GiB 4000.52 GB)
Used Dev Size : 3906759680 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f4f0753e:56b8d6a5:84ec2ce8:dbc933f0
Update Time : Sun Oct 28 11:38:32 2012
Checksum : 60093b72 - correct
Events : 0
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AA. ('A' == active, '.' == missing)
mdadm --examine /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 5cb45bae:7a4843ba:4ad7dbfb:5c129d2a
Name : merlin:1 (local to host merlin)
Creation Time : Wed Sep 26 07:32:32 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 3906760704 (1862.89 GiB 2000.26 GB)
Array Size : 3905299456 (3724.38 GiB 3999.03 GB)
Used Dev Size : 3905299456 (1862.19 GiB 1999.51 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 9e2f9ae6:6c95d05e:8d83970b:f1308de0
Update Time : Fri Oct 26 03:26:37 2012
Checksum : 79d4964b - correct
Events : 220
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AA. ('A' == active, '.' == missing)
mdadm --examine /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 5cb45bae:7a4843ba:4ad7dbfb:5c129d2a
Name : merlin:1 (local to host merlin)
Creation Time : Wed Sep 26 07:32:32 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 3906760704 (1862.89 GiB 2000.26 GB)
Array Size : 3905299456 (3724.38 GiB 3999.03 GB)
Used Dev Size : 3905299456 (1862.19 GiB 1999.51 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 98b07c41:ff4bea98:2a765a6b:63d820e0
Update Time : Fri Oct 26 03:26:37 2012
Checksum : 6e2767e8 - correct
Events : 220
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AA. ('A' == active, '.' == missing)
mdadm --examine /dev/sde1
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 6db9959d:3cdd4bc3:32a241ad:a9f37a0c
Name : merlin:0 (local to host merlin)
Creation Time : Sun Oct 28 12:12:59 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 3905299943 (1862.19 GiB 1999.51 GB)
Array Size : 3905299456 (3724.38 GiB 3999.03 GB)
Used Dev Size : 3905299456 (1862.19 GiB 1999.51 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 677a4410:8931e239:2c789f83:e130e6f7
Update Time : Sun Oct 28 12:12:59 2012
Checksum : 98cb1950 - correct
Events : 0
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : A.A ('A' == active, '.' == missing)
mdadm --examine /dev/sdf1
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 3700a0a6:3fadfd73:bc74b618:a5526767
Name : merlin:0 (local to host merlin)
Creation Time : Sun Oct 28 11:28:30 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 3905392640 (1862.24 GiB 1999.56 GB)
Array Size : 3905391616 (3724.47 GiB 3999.12 GB)
Used Dev Size : 3905391616 (1862.24 GiB 1999.56 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 5a8a5423:10b7a542:26b5e2b3:f0887121
Update Time : Sun Oct 28 11:28:30 2012
Checksum : 8e90495f - correct
Events : 0
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AA. ('A' == active, '.' == missing)
mdadm --examine /dev/sdg1
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 202255c9:786f474d:ba928527:68425dd6
Name : merlin:0 (local to host merlin)
Creation Time : Sun Oct 28 11:24:36 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 3905299943 (1862.19 GiB 1999.51 GB)
Array Size : 3905299456 (3724.38 GiB 3999.03 GB)
Used Dev Size : 3905299456 (1862.19 GiB 1999.51 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 4605c729:c290febb:92901971:9a3ed814
Update Time : Sun Oct 28 11:24:36 2012
Checksum : 38ba4d0a - correct
Events : 0
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AA. ('A' == active, '.' == missing)
mdadm --examine /dev/sdh1
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 682356f5:da2c442e:7bfc85f7:53aa9ea7
Name : merlin:0 (local to host merlin)
Creation Time : Sun Oct 28 12:13:44 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 3906761858 (1862.89 GiB 2000.26 GB)
Array Size : 3906760704 (3725.78 GiB 4000.52 GB)
Used Dev Size : 3906760704 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 489943b3:d5e35022:f52c917a:9ca6ff2a
Update Time : Sun Oct 28 12:13:44 2012
Checksum : f6947a7d - correct
Events : 0
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AA. ('A' == active, '.' == missing)
cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdc1[0] sdd1[1]
3905299456 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
unused devices: <none>
Any help would be greatly appreciated!
Best Answer
If you just lost one disk, you should have been able to recover from that using the very much safer
--assemble
.You've run create now so much that all the UUIDs are different.
sdc1
andsdd1
share a UUID (expected, as that's your working array)... the rest the disks share a name, but all have different UUIDs. So I'm guessing none of those are the original superblocks. Too bad...Anyway, I'd guess you're either attempting to use the wrong disks, or you're trying to use the wrong chunk size (the default has changed over time, I believe). Your old array may have also used a different superblock version—that default has definitely changed—which could offset all the sectors (and also destroy some of the data). Finally, its possible you're using the wrong layout, though that's less likely.
It's also possible that, your test array was read-write (from a md standpoint) that attempts to use ext3 actually did some writes. E.g., a journal replay. But that's only if it found a superblock at some point, I think.
BTW: I think you really ought to be using
--assume-clean
, though of course a degraded array will not try to start rebuilding. Then you probably want to immediately set read-only.