Linux – Bad performance with Linux software RAID5 and LUKS encryption

disk-encryptionlinuxluksraid-5software-raid

I have set up a Linux software RAID5 on three hard drives and want to encrypt it with cryptsetup/LUKS. My tests showed that the encryption leads to a massive performance decrease that I cannot explain.

The RAID5 is able to write 187 MB/s [1] without encryption. With encryption on top of it, write speed is down to about 40 MB/s.

The RAID has a chunk size of 512K and a write intent bitmap. I used -c aes-xts-plain -s 512 --align-payload=2048 as the parameters for cryptsetup luksFormat, so the payload should be aligned to 2048 blocks of 512 bytes (i.e., 1MB). cryptsetup luksDump shows a payload offset of 4096. So I think the alignment is correct and fits to the RAID chunk size.

The CPU is not the bottleneck, as it has hardware support for AES (aesni_intel). If I write on another drive (an SSD with LVM) that is also encrypted, I do have a write speed of 150 MB/s. top shows that the CPU usage is indeed very low, only the RAID5 xor takes 14%.

I also tried putting a filesystem (ext4) directly on the unencrypted RAID so see if the layering is problem. The filesystem decreases the performance a little bit as expected, but by far not that much (write speed varying, but > 100 MB/s).

Summary:
Disks + RAID5: good
Disks + RAID5 + ext4: good
Disks + RAID5 + encryption: bad
SSD + encryption + LVM + ext4: good

The read performance is not affected by the encryption, it is 207 MB/s without and 205 MB/s with encryption (also showing that CPU power is not the problem).

What can I do to improve the write performance of the encrypted RAID?

[1] All speed measurements were done with several runs of dd if=/dev/zero of=DEV bs=100M count=100 (i.e., writing 10G in blocks of 100M).

Edit: If this helps:
I'm using Ubuntu 11.04 64bit with Linux 2.6.38.

Edit2: The performance stays approximately the same if I pass a block size of 4KB, 1MB or 10MB to dd.

Best Answer

The solution is to set the stripe_cache_size feature for md raids.

By default it is set to 256, but it can be increased up to 32768.

This is done by writing the desired size to /sys/block/md0/md/stripe_cache_size (if the raid is md0). On Ask Ubuntu there is a solution for setting the value permanently.

I tested on the exact same RAID as in the question, and I got the following numbers:

size   256: 50 MB/s
size  4096: 123 MB/s
size  8192: 142 MB/s
size 16384: 140 MB/s
size 32768: 142 MB/s

These tests were conducted with Ubuntu 12.04 (Linux 3.2) by writing 10 GB into a file with blocks of 1 MB.

Background: The stripe cache stores recently written blocks. If data is written continuously, it might happen that during a first write only a part of one stripe is written. This means, the RAID code has to read the complete stripe from disk, update it, and write it completely again. If a second write comes in for another part of the same stripe, all this would have to be done again. Now, if the cache is used and still contains the data written by the first write, the read that was necessary before the second write can be omitted.

Usually a big block size when writing would prevent the problem (because full stripes are written at once, and thus no reads at all are necessary). However, it seems that the encryption uses only small blocks when writing to the underlying device, and thus increasing the cache has a positive effect.

Related Question