Linux – SSD/NVME friendly MD RAID1 rebuild in Linux

linuxraidsoftware-raidssd

I use SSD and NVME RAID1 arrays to store mostly virtual machine disks. More than 75% of the data are zeros (preallocated images, free space).

If a disk fails and gets replaced, the rebuild copies and writes all the data to the replacement disk, which causes thermal throttling in NVME, and I assume more wear on the SSD/NVME. Is there a way to configure the rebuild to compare data from both disks first and write to the new disk only if needed?

Or are SSD/NVME chips supposed to check whether the data are just zeros and if the target blocks are not written yet (giving zeros on read), it would just discard the data without wasting write cycles?
Or if there is any target data should it just trim the block to produce zeros?

I found an old thread at https://www.spinics.net/lists/raid/msg57529.html
but it did not provide an answer.

I tried a workaround but I think it is ugly plus the RAID must be offline.

mdadm --fail /dev/md0 /dev/sde
mdadm -r /dev/md0 /dev/sde

(replace /dev/sde)

mdadm -S /dev/md0
ddpt if=/dev/sdd of=/dev/sde verbose=1 oflag=sparing
mdadm -C -v /dev/md0 --assume-clean -l 1 -n 2 /dev/sdd /dev/sde

Any ideas for compare-write RAID1 rebuild? Thanks.

Best Answer

The short answer is: No.

The md-driver is tuned for performance. And it has a simply dirty-map to keep raid 1 members in sync.

So if a member fails, the whole map is dirty. Since md is just block-based it does not care for the contents of the blocks it just copies over blocks and clears the dirty-bits for these block.

KISS principle_ keep it simple stupid. Anything else would be on a higher level.

If you want that you could use drbd with two local members instead of md raid1. DRBD provides the means to verify before syncing.

Related Question