RAID6 scrubbing mismatch repair

corruptionmdadmraid

You can initiate a scrub of a mdadm array with echo 'check' > /sys/block/mdX/md/sync_action, and if a bad sector is found, it'll rewrite it automatically (from a mirror or from parity information for RAID5/6).

However if all blocks read successfully but are found to not be consistent, then this is regarded as a mismatch. In this case repair is complicated because mdadm cannot tell which mirror contains the correct data (RAID1/10) or whether it is data or parity that is corrupted (RAID5).

In theory this is not the case with RAID6 if I understand RAID6 correctly. Because double-parity exists, it should be possible to pinpoint where a single corruption is, whether it is data or parity.

  1. Is my understanding correct, should this be possible in theory?
  2. If correct, is mdadm able to repair this inconsistent data without guessing which block is corrupted?

Best Answer

It is possible in theory: the data+parity gives you three opinions on what the data should be; if two of them are consistent, you can assume the third is the incorrect one and re-write it based on the first two.

Linux RAID6 does not do this. Instead, any time there is a mismatch, the two parity values are assumed to be incorrect and recalculated from the data values. There have been proposals to change to a "majority vote" system, but it hasn't been implemented.

The mdadm package includes the raid6check utility that attempts to figure out which disk is bad in the event of a parity mismatch, but it has some rough edges, is not installed by default, and doesn't fix the errors it finds.

Related Question