RAID-5 Array – How Parity Works

raidraid-5

I'm looking to build a nice little RAID array for dedicated backups. I'd like to have about 2-4TB of space available, as I have this nasty little habit of digitizing everything. Thus, I need a lot of storage and a lot of redundancy in case of drive failure. I'll also essentially be backing up 2-3 computers' /home folders using one of the "Time Machine" clones for Linux. This array will be accessible over my local network via SSH.

I'm having difficulties understanding how RAID-5 achieves parity and how many drives are actually required. One would assume that it needs 5 drives, but I could be wrong. Most of the diagrams I've seen have only yet confused me. It seems that this is how RAID-5 works, please correct me as I'm sure I'm not grasping it properly:

/---STORAGE---\    /---PARITY----\
|   DRIVE_1   |    |   DRIVE_4   |
|   DRIVE_2   |----|     ...     |
|   DRIVE_3   |    |             |
\-------------/    \-------------/

It seems that drives 1-3 appear and work as a single, massive drive (capacity * number_of_drives) and the parity drive(s) back up those drives. What seems strange to me is that I usually see 3+ storage drives in a diagram to only 1 or 2 parity drives. Say we're running 4 1TB drives in a RAID-5 array, 3 running storage and 1 running parity, we have 3TB of actual storage, but only have 1TB of parity!?

I know I'm missing something here, can someone help me out? Also, for my use case, what would be better, RAID-5 or RAID-6? Fault tolerance is the highest priority for me at this point, since it's going to be running over a network for home use only, speed isn't hugely critical.

Best Answer

It just XORs each corresponding bit from each drive - If you lose any drive, you can re-build the missing data.

For background:

A B (A XOR B)
0 0    0
1 1    0
0 1    1
1 0    1

Assume that D is the XOR of the other columns, then as long as you only lose one drive, you can figure out what you lost.

A B C D
1 0 0 1
0 1 0 1
1 1 0 0

Some times the stripe bit will be distributed across the drives, but the concept is the same.

So for RAID-5, no matter how many drives, you only need 1 drive for parity equal or bigger than the smallest drive in the array you want to RAID.

RAID-5 for personal use is probably best as computational complexity is much lower than RAID-6.

RAID-6 is more complicated using Galois Fields to compute parity. And that can tax parity computations. However, you can lose more drives, but if you rebuild your array as soon as you get a single failure, you should be fine sticking with RAID-5.

Related Question