A friend of mine has a mdadm-raid5 with 9 disks which does not reassemble anymore.
After having a look at the syslog I found that the disk sdi was kicked from the array:
Jul 6 08:43:25 nasty kernel: [ 12.952194] md: bind<sdc>
Jul 6 08:43:25 nasty kernel: [ 12.952577] md: bind<sdd>
Jul 6 08:43:25 nasty kernel: [ 12.952683] md: bind<sde>
Jul 6 08:43:25 nasty kernel: [ 12.952784] md: bind<sdf>
Jul 6 08:43:25 nasty kernel: [ 12.952885] md: bind<sdg>
Jul 6 08:43:25 nasty kernel: [ 12.952981] md: bind<sdh>
Jul 6 08:43:25 nasty kernel: [ 12.953078] md: bind<sdi>
Jul 6 08:43:25 nasty kernel: [ 12.953169] md: bind<sdj>
Jul 6 08:43:25 nasty kernel: [ 12.953288] md: bind<sda>
Jul 6 08:43:25 nasty kernel: [ 12.953308] md: kicking non-fresh sdi from array!
Jul 6 08:43:25 nasty kernel: [ 12.953314] md: unbind<sdi>
Jul 6 08:43:25 nasty kernel: [ 12.960603] md: export_rdev(sdi)
Jul 6 08:43:25 nasty kernel: [ 12.969675] raid5: device sda operational as raid disk 0
Jul 6 08:43:25 nasty kernel: [ 12.969679] raid5: device sdj operational as raid disk 8
Jul 6 08:43:25 nasty kernel: [ 12.969682] raid5: device sdh operational as raid disk 6
Jul 6 08:43:25 nasty kernel: [ 12.969684] raid5: device sdg operational as raid disk 5
Jul 6 08:43:25 nasty kernel: [ 12.969687] raid5: device sdf operational as raid disk 4
Jul 6 08:43:25 nasty kernel: [ 12.969689] raid5: device sde operational as raid disk 3
Jul 6 08:43:25 nasty kernel: [ 12.969692] raid5: device sdd operational as raid disk 2
Jul 6 08:43:25 nasty kernel: [ 12.969694] raid5: device sdc operational as raid disk 1
Jul 6 08:43:25 nasty kernel: [ 12.970536] raid5: allocated 9542kB for md127
Jul 6 08:43:25 nasty kernel: [ 12.973975] 0: w=1 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0
Jul 6 08:43:25 nasty kernel: [ 12.973980] 8: w=2 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0
Jul 6 08:43:25 nasty kernel: [ 12.973983] 6: w=3 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0
Jul 6 08:43:25 nasty kernel: [ 12.973986] 5: w=4 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0
Jul 6 08:43:25 nasty kernel: [ 12.973989] 4: w=5 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0
Jul 6 08:43:25 nasty kernel: [ 12.973992] 3: w=6 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0
Jul 6 08:43:25 nasty kernel: [ 12.973996] 2: w=7 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0
Jul 6 08:43:25 nasty kernel: [ 12.973999] 1: w=8 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0
Jul 6 08:43:25 nasty kernel: [ 12.974002] raid5: raid level 5 set md127 active with 8 out of 9 devices, algorithm 2
Unfortunately this wasn't recognized and now another drive was kicked (sde):
Jul 14 08:02:45 nasty kernel: [ 12.918556] md: bind<sdc>
Jul 14 08:02:45 nasty kernel: [ 12.919043] md: bind<sdd>
Jul 14 08:02:45 nasty kernel: [ 12.919158] md: bind<sde>
Jul 14 08:02:45 nasty kernel: [ 12.919260] md: bind<sdf>
Jul 14 08:02:45 nasty kernel: [ 12.919361] md: bind<sdg>
Jul 14 08:02:45 nasty kernel: [ 12.919461] md: bind<sdh>
Jul 14 08:02:45 nasty kernel: [ 12.919556] md: bind<sdi>
Jul 14 08:02:45 nasty kernel: [ 12.919641] md: bind<sdj>
Jul 14 08:02:45 nasty kernel: [ 12.919756] md: bind<sda>
Jul 14 08:02:45 nasty kernel: [ 12.919775] md: kicking non-fresh sdi from array!
Jul 14 08:02:45 nasty kernel: [ 12.919781] md: unbind<sdi>
Jul 14 08:02:45 nasty kernel: [ 12.928177] md: export_rdev(sdi)
Jul 14 08:02:45 nasty kernel: [ 12.928187] md: kicking non-fresh sde from array!
Jul 14 08:02:45 nasty kernel: [ 12.928198] md: unbind<sde>
Jul 14 08:02:45 nasty kernel: [ 12.936064] md: export_rdev(sde)
Jul 14 08:02:45 nasty kernel: [ 12.943900] raid5: device sda operational as raid disk 0
Jul 14 08:02:45 nasty kernel: [ 12.943904] raid5: device sdj operational as raid disk 8
Jul 14 08:02:45 nasty kernel: [ 12.943907] raid5: device sdh operational as raid disk 6
Jul 14 08:02:45 nasty kernel: [ 12.943909] raid5: device sdg operational as raid disk 5
Jul 14 08:02:45 nasty kernel: [ 12.943911] raid5: device sdf operational as raid disk 4
Jul 14 08:02:45 nasty kernel: [ 12.943914] raid5: device sdd operational as raid disk 2
Jul 14 08:02:45 nasty kernel: [ 12.943916] raid5: device sdc operational as raid disk 1
Jul 14 08:02:45 nasty kernel: [ 12.944776] raid5: allocated 9542kB for md127
Jul 14 08:02:45 nasty kernel: [ 12.944861] 0: w=1 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0
Jul 14 08:02:45 nasty kernel: [ 12.944864] 8: w=2 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0
Jul 14 08:02:45 nasty kernel: [ 12.944867] 6: w=3 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0
Jul 14 08:02:45 nasty kernel: [ 12.944871] 5: w=4 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0
Jul 14 08:02:45 nasty kernel: [ 12.944874] 4: w=5 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0
Jul 14 08:02:45 nasty kernel: [ 12.944877] 2: w=6 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0
Jul 14 08:02:45 nasty kernel: [ 12.944879] 1: w=7 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0
Jul 14 08:02:45 nasty kernel: [ 12.944882] raid5: not enough operational devices for md127 (2/9 failed)
And now the array does not start anymore.
However it seems that every disk contains the raid metadata:
/dev/sda:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9
Name : nasty:stuff (local to host nasty)
Creation Time : Sun Mar 16 02:37:47 2014
Raid Level : raid5
Raid Devices : 9
Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB)
Array Size : 62512275456 (29808.18 GiB 32006.29 GB)
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 8600bda9:18845be8:02187ecc:1bfad83a
Update Time : Mon Jul 14 00:45:35 2014
Checksum : e38d46e8 - correct
Events : 123132
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAA.AAA.A ('A' == active, '.' == missing)
/dev/sdc:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9
Name : nasty:stuff (local to host nasty)
Creation Time : Sun Mar 16 02:37:47 2014
Raid Level : raid5
Raid Devices : 9
Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB)
Array Size : 62512275456 (29808.18 GiB 32006.29 GB)
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : fe612c05:f7a45b0a:e28feafe:891b2bda
Update Time : Mon Jul 14 00:45:35 2014
Checksum : 32bb628e - correct
Events : 123132
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAA.AAA.A ('A' == active, '.' == missing)
/dev/sdd:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9
Name : nasty:stuff (local to host nasty)
Creation Time : Sun Mar 16 02:37:47 2014
Raid Level : raid5
Raid Devices : 9
Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB)
Array Size : 62512275456 (29808.18 GiB 32006.29 GB)
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 1d14616c:d30cadc7:6d042bb3:0d7f6631
Update Time : Mon Jul 14 00:45:35 2014
Checksum : 62bd5499 - correct
Events : 123132
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAA.AAA.A ('A' == active, '.' == missing)
/dev/sde:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9
Name : nasty:stuff (local to host nasty)
Creation Time : Sun Mar 16 02:37:47 2014
Raid Level : raid5
Raid Devices : 9
Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB)
Array Size : 62512275456 (29808.18 GiB 32006.29 GB)
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : a2babca3:1283654a:ef8075b5:aaf5d209
Update Time : Mon Jul 14 00:45:07 2014
Checksum : f78d6456 - correct
Events : 123123
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAAAAA.A ('A' == active, '.' == missing)
/dev/sdf:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9
Name : nasty:stuff (local to host nasty)
Creation Time : Sun Mar 16 02:37:47 2014
Raid Level : raid5
Raid Devices : 9
Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB)
Array Size : 62512275456 (29808.18 GiB 32006.29 GB)
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : e67d566d:92aaafb4:24f5f16e:5ceb0db7
Update Time : Mon Jul 14 00:45:35 2014
Checksum : 9223b929 - correct
Events : 123132
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 4
Array State : AAA.AAA.A ('A' == active, '.' == missing)
/dev/sdg:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9
Name : nasty:stuff (local to host nasty)
Creation Time : Sun Mar 16 02:37:47 2014
Raid Level : raid5
Raid Devices : 9
Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB)
Array Size : 62512275456 (29808.18 GiB 32006.29 GB)
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 2cee1d71:16c27acc:43e80d02:1da74eeb
Update Time : Mon Jul 14 00:45:35 2014
Checksum : 7512efd4 - correct
Events : 123132
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : AAA.AAA.A ('A' == active, '.' == missing)
/dev/sdh:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9
Name : nasty:stuff (local to host nasty)
Creation Time : Sun Mar 16 02:37:47 2014
Raid Level : raid5
Raid Devices : 9
Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB)
Array Size : 62512275456 (29808.18 GiB 32006.29 GB)
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : c239f0ad:336cdb88:62c5ff46:c36ea5f8
Update Time : Mon Jul 14 00:45:35 2014
Checksum : c08e8a4d - correct
Events : 123132
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 6
Array State : AAA.AAA.A ('A' == active, '.' == missing)
/dev/sdi:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9
Name : nasty:stuff (local to host nasty)
Creation Time : Sun Mar 16 02:37:47 2014
Raid Level : raid5
Raid Devices : 9
Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB)
Array Size : 62512275456 (29808.18 GiB 32006.29 GB)
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : d06c58f8:370a0535:b7e51073:f121f58c
Update Time : Mon Jul 14 00:45:07 2014
Checksum : 77844dcc - correct
Events : 0
Layout : left-symmetric
Chunk Size : 512K
Device Role : spare
Array State : AAAAAAA.A ('A' == active, '.' == missing)
/dev/sdj:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9
Name : nasty:stuff (local to host nasty)
Creation Time : Sun Mar 16 02:37:47 2014
Raid Level : raid5
Raid Devices : 9
Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB)
Array Size : 62512275456 (29808.18 GiB 32006.29 GB)
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f2de262f:49d17fea:b9a475c1:b0cad0b7
Update Time : Mon Jul 14 00:45:35 2014
Checksum : dd0acfd9 - correct
Events : 123132
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 8
Array State : AAA.AAA.A ('A' == active, '.' == missing)
But as you can see the two drives (sde, sdi) are in active state (but raid is stopped) and sdi is a spare.
While sde has a slightly lower Events-count than most of the other drives (123123 instead of 123132) sdi has an Events-count of 0. So I think sde is almost up-to-date. But sdi not …
Now we read online that a hard power-off could cause these "kicking non-fresh"-messages. And indeed my friend caused a hard power-off one or two times. So we followed the instructions we found online and tried to re-add sde to the array:
$ mdadm /dev/md127 --add /dev/sde
mdadm: add new device failed for /dev/sde as 9: Invalid argument
But that failed and now mdadm --examine /dev/sde
shows an Events-count of 0 for sde too (+ it's a spare now like sdi):
/dev/sde:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9
Name : nasty:stuff (local to host nasty)
Creation Time : Sun Mar 16 02:37:47 2014
Raid Level : raid5
Raid Devices : 9
Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB)
Array Size : 62512275456 (29808.18 GiB 32006.29 GB)
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 689e0030:142122ae:7ab37935:c80ab400
Update Time : Mon Jul 14 00:45:35 2014
Checksum : 5e6c4cf7 - correct
Events : 0
Layout : left-symmetric
Chunk Size : 512K
Device Role : spare
Array State : AAA.AAA.A ('A' == active, '.' == missing)
We know that 2 failed drives usually means the death for a raid5. However is there a way to add at least sde to the raid so that data can be saved?
Best Answer
OK, it looks like we have now access to the raid. At least the first checked files looked good. So here is what we have done:
The raid recovery article on the kernel.org wiki suggests two possible solutions for our problem:
using
--assemble --force
(also mentioned by derobert)The article says:
In our case the drive
sde
had an event difference of 9. So there was a good chance that--force
would work. However after we executed the--add
command the event count dropped to 0 and the drive was marked as spare.So we better desisted from using
--force
.recreate the array
This solution is explicitly marked as dangerous because you can loose data if you do something wrong. However this seemed to be the only option we had.
The idea is to create a new raid on the existing raid-devices (that is overwriting the device's superblocks) with the same configuration of the old raid and explicitly tell mdadm that the raid has already existed and should be assumed as clean.
Since the event count difference was just 9 and the only problem was that we lost the superblock of
sde
there were good chances that writing new superblocks will get us access to our data... and it worked :-)Our solution
Note: This solution was specially geared to our problem and may not work on your setup. You should take these notes to get an idea on how things can be done. But you need to research what's best in your case.
Backup
We already lost a superblock. So this time we saved the first and last gigabyte of each raid device (
sd[acdefghij]
) using dd before working on the raid. We did this for each raid device:Gather information
When recreating the raid it is important to use the same configration as the old raid. This is especially important if you want to recreate the array on another machine using a different mdadm version. In this case mdadm's default values may be different and could create superblocks that do not fit to the existing raid (see the wiki article).
In our case we use the same machine (and thus the same mdadm-version) to recreate the array. However the array was created by a 3rd party tool in the first place. So we didn't want to rely on default values here and had to gather some information about the existing raid.
From the output of
mdadm --examine /dev/sd[acdefghij]
we get the following information about the raid (Note: sdb was the ssd containing the OS and was not part of the raid):The
Used Dev Size
is denominated in blocks of 512 byte. You can check this:7814034432*512/1000000000 ~= 4000.79
But mdadm requires the size in Kibibytes:
7814034432*512/1024 = 3907017216
Important is the
Device Role
. In the new raid each device must have the same role as before. In our case:Note: Drive letters (and thus the order) can change after reboot!
We also need the layout and the chunk size in the next step.
Recreate raid
We now can use the information of the last step to recreate the array:
It is important to pass the devices in the correct order!
Moreover we did not add
sdi
as it's event count was too low. So we set the 7th raid slot tomissing
. Thus the raid5 contains 8 of 9 devices and will be assembled in degraded mode. And because it lacks a spare device no rebuild will automatically start.Then we used
--examine
to check if the new superblocks fit to our old superblocks. And it did :-) We were able to mount the filesystem and read the data. The next step is to backup the data and then add backsdi
and start the rebuild.