“DegradedArray event”

raidraid1software-raid

I have a RAID 1 on my Raspi and got mail I don't understand titled "DegradedArray event on /dev/md0:my-host-name" and "Fail event on /dev/md0:my-host-name". I got 6 messages with the former subject and 2 with the latter.

The first kind of mails look like this:

This is an automatically generated mail message from mdadm running on
my-host-name

A DegradedArray event had been detected on md device /dev/md0.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] md0 : active raid1 sda1[0]
124967936 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices:

The second kind of mails like this:

This is an automatically generated mail message from mdadm running on
my-host-name

A Fail event had been detected on md device /dev/md0.

It could be related to component device /dev/sdb1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] md0 : active raid1 sdb11(F) sda1[0]
124967936 blocks super 1.2 [2/1] [U_]
bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices:

I restarted my Raspi before learning about these messages. This is the current RAID status:

Personalities : [raid1] md0 : active raid1 sdb11 sda1[0]
124967936 blocks super 1.2 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices:

I did not touch the Raspi or its drives when the event happened. However, I cannot exclude that someone else did and possibly loosened the connection of one of the data cables.

Did this happen? Did something else happen? Did the RAID restore itself to normal operation or do I have to do something?

Possibly related: Meaning of Security Information Mail

Best Answer

The first message means that your RAID array went into an inconsistent state, because apparently the sdb drive was detected as failing (second message). The current status looks like the array was restored, but you may want to check the output of smartctl --all /dev/sdb to get the current health status (which is printed before the drive parameters and the error log) and check if the drive parameters are suspicious (things like Reallocated Sector Count or Current Pending Sector hint to a potential problem) or if the device has (new) entries in the error log. You may also want to check dmesg for messages related to sdb.

You could also, for extra safety, remove sdb1 from the RAID array and execute a test with smartctl on it (eg. smartctl -t short /dev/sdb for a short test or smartctl -t long /dev/sdb for a more thorough test).

Please note that you need to use -d <...> for smartctl with a parameter <...> that fits your device. Refer to this list of supported USB devices for the correct one. To get the USB IDs, you can use lsusb. If your device is not listed, you may look for related devices (eg. by the same vendor or having a similar name).

Related Question