Linux software RAID robustness for RAID1 vs other RAID levels

linuxmdadmraidredundancy

I have a RAID5 array running and now also a raid1 that I set up yesterday. Since RAID5 calculates parity it should be able to catch silent data corruption on one disk. However for RAID1 the disks are just mirrors. The more I think about it I figure that RAID1 is actually quite risky. Sure it will save me from a disk failure but it might not be as good when it comes to protecting the data on disk (who is actually more important for me).

  1. How does Linux software RAID actually store RAID1 type data on disk?
  2. How does it know what spindle is giving corrupt data (if the disk(subsystem) is not reporting any errors)

If RAID1 really is not giving me data protection but rather disk protection is there some tricks I can do with mdadm to create a two disk "RAID5 like" setup? E.g. loose capacity but still keep redundancy also for data?

Best Answer

Focusing on the actual questions...

Even RAID 5 will not be able to correct silent bit rot, but it can detect it during a data scrub. Though it will be able to correct a single block that has been reported by the disk as having an Unrecoverable Read Error (URE). Note that not all drives in a RAID5 stripe are read from for a normal data read, so if the error exists in the stripe on the unused disk it will go undetected until you perform a data scrub. Silent bit rot detection with any standard RAID can only occur during data scrubbing. RAID 5 cannot do even this during a rebuild of a failed disk, this is what most concerns these days are with RAID 5.

  1. Linux mdadm RAID 1, like nearly all RAID 1 implementations, is just duplicating/mirroring the same data on to multiple disks. It adds no error correction or detection data. If you take a disk out of any RAID 1 and use it in another PC, it will very likely just work as a normal single disk. Linux mdadm adds some array description to the start of the disk so it can know which partitions belong to what array, so mdadm will know it was a RAID 1 but can mount and use the single disk anyway.
  2. All RAID 1 controllers, be they software or hardware, rely on the fact that HDDs use their own error detection and correction methods. See this wikipeadia article for some info on how HDDs do this, in particular note the use of Error Correction Coding (ECC).

This is why most bit rot will be reported as an Uncorrectable Read Error (URE) by the disk systems to mdadm. However there are still risks to your data that will not result in any error being reported by the disk such as

  • if there was a head positioning error during a write so some random nearby sector is overwritten with data and correct ECC data for that block. Reading the block that was actually written will report that it read the block just fine, even though it is not.
  • the server lost power before it had written its data to all of the disks in the array, then some blocks in that stripe will be in disagreement with the others.

and other types of errors such as those described on the ServerFault page Is bit rot on hard drives a real problem? What can be done about it?

RAID 6 and RAID 1 arrays with at least 3 disks are the only standard RAID levels that have the potential to be able to detect and correct some forms of silent bit rot that are not reported by the individual disks as errors, though I do not know if mdadm implements the required code for this. By using a forward error correction style voting system.

  • For RAID 6 - only if the error is in one of the parity blocks. This is due to the possibility of a 3 way vote between data, parity 1, and parity 2. If parity blocks 1 or 2 say there is an error but the other 2 do not, then the parity block can essentially be out voted. The reason it cannot correct the problem if the error is in one of the data blocks is that it cannot know which data block has the error, unless it is a 3 disk raid 6, which are typically not allowed. I doubt that any implementation, including mdadm, will bother with such an obscure correction scheme and just report it as an error.
  • For RAID 1 with 3 or more active supposedly already synchronised disks it can conduct a simple majority vote. Though again, I don't know if any RAID implementation bothers with this logic as not many people use a 3+ disk raid 1. If it did implement the required logic a RAID 1 that
    • normally had 3 disks, a block with silent bit rot could be auto-corrected, though not if it was during a rebuild as that would reduce the number of active sync'd disks to 2.
    • a 4 disk raid 1 could auto-correct any stripe with a single bad block even during a rebuild of 1 failed disk.
    • a 5 disk can auto-correct a stripe with 2 silently bad blocks, though that is reduced to 1 if it is found during a rebuild of 1 or 2 simultaneously failed disks.

FYI I noticed that the Synology DS1813+ devices use mdadm for both data and system partitions and it uses RAID 1 across all 8 disks for the system partitions.

As you may have observed this places a lot of reliance on the disk being able to report bad data as an error. While everyone is saying to use ZFS to solve this issue. I believe ZFS's main data integrity improvements are that it provides more frequent data scrubbing due to it checking mirrors/parity with every read, and independent block level parity (which means many silently corrupted blocks are no longer silent and corrected if possible) and it may implement the above logic for silent data corruption.

To test if a particular system can detect and/or correct silent data corruption use the Linux dd command to write random data to one of the partitions in the array, then test if the data is still good on the array. Warning do not do this test on a system with data you want to keep as your system may fail the test. For standard RAID levels you will need to perform a data scrub between corruption and test read.

Related Question