“DegradedArray event”

raidraid1software-raid

I have a RAID 1 on my Raspi and got mail I don't understand titled "DegradedArray event on /dev/md0:my-host-name" and "Fail event on /dev/md0:my-host-name". I got 6 messages with the former subject and 2 with the latter.

The first kind of mails look like this:

This is an automatically generated mail message from mdadm running on
my-host-name

A DegradedArray event had been detected on md device /dev/md0.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] md0 : active raid1 sda1[0]
124967936 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices:

The second kind of mails like this:

This is an automatically generated mail message from mdadm running on
my-host-name

A Fail event had been detected on md device /dev/md0.

It could be related to component device /dev/sdb1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] md0 : active raid1 sdb11(F) sda1[0]
124967936 blocks super 1.2 [2/1] [U_]
bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices:

I restarted my Raspi before learning about these messages. This is the current RAID status:

Personalities : [raid1] md0 : active raid1 sdb11 sda1[0]
124967936 blocks super 1.2 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices:

I did not touch the Raspi or its drives when the event happened. However, I cannot exclude that someone else did and possibly loosened the connection of one of the data cables.

Did this happen? Did something else happen? Did the RAID restore itself to normal operation or do I have to do something?

Possibly related: Meaning of Security Information Mail

Best Answer

The first message means that your RAID array went into an inconsistent state, because apparently the sdb drive was detected as failing (second message). The current status looks like the array was restored, but you may want to check the output of smartctl --all /dev/sdb to get the current health status (which is printed before the drive parameters and the error log) and check if the drive parameters are suspicious (things like Reallocated Sector Count or Current Pending Sector hint to a potential problem) or if the device has (new) entries in the error log. You may also want to check dmesg for messages related to sdb.

You could also, for extra safety, remove sdb1 from the RAID array and execute a test with smartctl on it (eg. smartctl -t short /dev/sdb for a short test or smartctl -t long /dev/sdb for a more thorough test).

Please note that you need to use -d <...> for smartctl with a parameter <...> that fits your device. Refer to this list of supported USB devices for the correct one. To get the USB IDs, you can use lsusb. If your device is not listed, you may look for related devices (eg. by the same vendor or having a similar name).

Related Solutions

Ubuntu – mdadm – RAID5 array size vs. actual disk size mismatch

fdisk is the wrong tool for disks >2TB. Use parted or gdisk instead.

It appears that /dev/sdc1 and /dev/sdd1 are 2TB partitions, so that's what limits your array size. For the other disks, they have GPT so I assume they are 3TB already, but you should check.

Basically you have to stop the array, enlarge each partition to 3TB (without changing the starting offset), then start it again and follow it up with a grow:

mdadm --grow /dev/md0 --size=max

If you can't stop the array, you'll have to fail each 2TB partition individually, repartition and re-add it. This might go faster if you add a write-intent bitmap first.

mdadm --grow /dev/md0 --bitmap=internal

Then for each disk individually,

mdadm /dev/md0 --fail /dev/disk1 # check mdstat for [UUUU] first
mdadm /dev/md0 --remove /dev/disk1
parted /dev/disk -- mklabel gpt mkpart primary 1mib -1mib
mdadm /dev/md0 --re-add /dev/disk1
mdadm --wait /dev/md0 # must wait for sync

Once that's done you can remove the bitmap again (keeping it may harm performance).

mdadm --grow /dev/md0 --bitmap=none
mdadm --grow /dev/md0 --size=max

Finally do your resize2fs or whatever.

Centos – How to resize / shifting partitions

Since you've partitioned your RAID as if it was a single disk, you can ignore the RAID altogether in this case. So it's merely a problem of resizing / shifting partitions.

So for example, you could shrink the www partition, delete the swap and then shift the root partition to the left in order to grow it.

Or, if that seems to complicated and you don't strictly need separate partitions, you could merge the root partition into your www partition since that's already large enough to hold both root and www. That's kind of what I would do.

# mount stuff
mkdir /mnt/root /mnt/www
mount /dev/md0p5 /mnt/root
mount /dev/md0p2 /mnt/www

# since /mnt/www will be the new root, move www files to /var/www
mkdir -p /mnt/www/var/www
mv /mnt/www/* /mnt/var/www/

# copy the root files
rsync -avAHSX /mnt/root/. /mnt/www/.

# comment out old root partition in fstab
# change /var/www to / in fstab

# update bootloader and reboot

This approach also has the advantage that if anything goes wrong, the original root partition is still intact, so you can revert the operation.

Once everything is working fine with the merged root+www partition, you can delete the old root partition and grow it to the full disk size.

Or you could decide that you want to stick with separate partitions after all and move the www files to the old root partition, if you think that's going to be large enough for your www in the foreseeable future.

Or you could shrink the www partition to make room for a new one.

Endless possibilities...

Best Answer

Related Solutions

Ubuntu – mdadm – RAID5 array size vs. actual disk size mismatch

Centos – How to resize / shifting partitions

Related Question