Linux – How to improve performance of RSYNC’s delta transfer on receiving host using software only

backuplinuxnasperformancersync

I'm using RSYNC to backup VMs for Virtual Box from one server to some Synology NAS DS1512+. The important point is that I really want to backup the VM-images itself, NOT individual files within those images. I'm doing that additionally already and is NOT the problem here.

Backing up all those images using --whole-file takes ~3 hours. But the NAS uses BTRFS and I would like to use its snapshots features to really only store differences, which doesn't work with --whole-file, because the whole file gets transferred and really rewritten. --inplace is used already, but doesn't change that concrete aspect, only if new files are created or not. To make efficient use of snapshots, RSYNC really needs to only transfer differences between files.

And that's the problem: When removing --whole-file to only transfer those differences, the time necessary to backup the same amount of data increases a lot. I've killed RSYNC after running 10 hours already, because I need it to finish far earlier to not overlap with other backups etc. Looking at the files transferred after those 10 hours, it seemed to have only been half the way anyway. So delta transfer is far too slow for some reason.

I'm somewhat sure that the bottleneck is I/O on the NAS: The server hadn't too much of that and even in theory it shouldn't matter too much if the server reads using --whole-file or not. Some of those VMs are hundreds of GiB in size and the server uses ZFS, so those images are not necessarily aligned for optimal sequential reads anyway. It has plenty of free RAM to cache things and the disk are more or less idling when not using --whole-file.

Though, especially reads are not too slow on the NAS as well: While there are some drops, it goes up to 50-70 MiB/s for longer periods of time. Writes don't seem too slow as well, but are nowhere as when using --whole-file, when it reaches 100+ MiB/s for large periods of time. What's somewhat interesting is the CPU-load, which is pretty high especially when not using --whole-file and most likely is necessary because of BTRFS compression. But that compression is needed as well to efficiently use the available space.

htop on the NAS

My expectation was that especially for reads it shouldn't matter too much if using --whole-file or not in my setup. BTRFS and ZFS on the NAS don't necessarily align written files for sequential reads anyway. While I guessed that bursts wouldn't be as high as with --whole-file, I expected that delta transfer minimizes the amount of data to write overall and that things would nullify each other therefore. But that doesn't seem to be the case for some reason.

Finally, I'm using the following options:

--owner \
--numeric-ids \
--compress-level=0 \
--group \
--perms \
--rsh=rsh \
--devices \
--hard-links \
--inplace \
--whole-file \
--links \
--recursive \
--times \
--delete \
--delete-during \
--delete-excluded \
--rsync-path=[...] \
--specials

Is there anything obvious in those options explaining the differences between --whole-file and not? Something known to act badly in the latter case? Is there anything that can be improved on the receiving site using RSYNC?

Investing money for more better hardware like SSDs etc. is not an option. Either I find some wrong usage of RSYNC or need to live with --whole-file and not having snapshots.

Thanks for your suggestions!

Best Answer

With --whole-file, no reads need to be done on the target side, it can just truncate the file and write it out buffered. It also doesn't need to do any checksumming, you are just transferring the whole file regardless.

Without --whole-file, it has to read the whole file on the target side and overwrite the blocks that changed. Reading, especially in a copy-on-write file system like ZFS and BTRFS can be slower than writing.

So I suspect the difference in performance you are seeing comes from greater sequentiality of writes vs. reads. Depending you your ZFS recordsize, you should set the same value with --block-size. ZFS recordsize defaults to 128KB, if your version supports large blocks you can set both the ZFS recordsize and rsync block-size to 1MB, that will help reduce fragmentation.

Related Question