First check the disks, try running smart selftest
for i in a b c d; do
smartctl -s on -t long /dev/sd$i
done
It might take a few hours to finish, but check each drive's test status every few minutes, i.e.
smartctl -l selftest /dev/sda
If the status of a disk reports not completed because of read errors, then this disk should be consider unsafe for md1 reassembly. After the selftest finish, you can start trying to reassembly your array. Optionally, if you want to be extra cautious, move the disks to another machine before continuing (just in case of bad ram/controller/etc).
Recently, I had a case exactly like this one. One drive got failed, I re-added in the array but during rebuild 3 of 4 drives failed altogether. The contents of /proc/mdadm was the same as yours (maybe not in the same order)
md1 : inactive sdc2[2](S) sdd2[4](S) sdb2[1](S) sda2[0](S)
But I was lucky and reassembled the array with this
mdadm --assemble /dev/md1 --scan --force
By looking at the --examine output you provided, I can tell the following scenario happened: sdd2 failed, you removed it and re-added it, So it became a spare drive trying to rebuild. But while rebuilding sda2 failed and then sdb2 failed. So the events counter is bigger in sdc2 and sdd2 which are the last active drives in the array (although sdd didn't have the chance to rebuild and so it is the most outdated of all). Because of the differences in the event counters, --force will be necessary. So you could also try this
mdadm --assemble /dev/md1 /dev/sd[abc]2 --force
To conclude, I think that if the above command fails, you should try to recreate the array like this:
mdadm --create /dev/md1 --assume-clean -l5 -n4 -c64 /dev/sd[abc]2 missing
If you do the --create
, the missing
part is important, don't try to add a fourth drive in the array, because then construction will begin and you will lose your data. Creating the array with a missing drive, will not change its contents and you'll have the chance to get a copy elsewhere (raid5 doesn't work the same way as raid1).
If that fails to bring the array up, try this solution (perl script) here Recreating an array
If you finally manage to bring the array up, the filesystem will be unclean and probably corrupted. If one disk fails during rebuild, it is expected that the array will stop and freeze not doing any writes to the other disks. In this case two disks failed, maybe the system was performing write requests that wasn't able to complete, so there is some small chance you lost some data, but also a chance that you will never notice it :-)
edit: some clarification added.
The point of RAID with redundancy is that it will keep going as long as it can, but obviously it will detect errors that put it into a degraded mode, such as a failing disk. You can show the current status of an array with mdadm -D
:
# mdadm -D /dev/md0
<snip>
0 8 5 0 active sync /dev/sda5
1 8 23 1 active sync /dev/sdb7
Furthermore the return status of mdadm -D
is nonzero if there is any problem such as a failed component (1 indicates an error that the RAID mode compensates for, and 2 indicates a complete failure).
You can also get a quick summary of all RAID device status by looking at /proc/mdstat
. You can get information about a RAID device in /sys/class/block/md*/md/*
as well; see Documentation/md.txt
in the kernel documentation. Some /sys
entries are writable as well; for example you can trigger a full check of md0
with echo check >/sys/class/block/md0/md/sync_action
.
In addition to these spot checks, mdadm can notify you as soon as something bad happens. Make sure that you have MAILADDR root
in /etc/mdadm.conf
(some distributions (e.g. Debian) set this up automatically). Then you will receive an email notification as soon as an error (a degraded array) occurs.
Make sure that you do receive mail send to root on the local machine (some modern distributions omit this, because they consider that all email goes through external providers — but receiving local mail is necessary for any serious system administrator). Test this by sending root a mail: echo hello | mail -s test root@localhost
. Usually, a proper email setup requires two things:
- Run an MTA on your local machine. The MTA must be set up at least to allow local mail delivery. All distributions come with suitable MTAs, pick anything (but not nullmailer if you want the email to be delivered locally).
Redirect mail going to system accounts (at least root
) to an address that you read regularly. This can be your account on the local machine, or an external email address. With most MTAs, the address can be configured in /etc/aliases
; you should have a line like
root: djsmiley2k
for local delivery, or
root: djsmiley2k@mail-provider.example.com
for remote delivery. If you choose remote delivery, make sure that your MTA is configured for that. Depending on your MTA, you may need to run the newaliases
command after editing /etc/aliases
.
Best Answer
I'm sorry, but you've just hit the very common problem known as "write hole". In short words you do not have any chance to recover your array. More information on Wikipedia: http://en.wikipedia.org/wiki/RAID_5_write_hole
Expensive RAID controllers are equipped with batteries to avid this problem.
I hope you have a backup, that's the last chance of yours.