Why is the rsync slow

disk-imagenetworkingosxrsync

I run a rsync job transferring about 45 million files/approximately 1.8 TB data (a Mac OS X Time Machine backup) over a 100 MBit connection.

I use rsync 3.1.1 from MacPorts (I first tried the built in rsync, version 2.6.9, since it has a Mac OS X specific cache parameter, but it ran out of memory) with the following parameters

rsync -HzvhErlptgoDW --stats --progress --out-format="%t %f %b" /source/ /destination/

The source is an external 3.5" HDD connected with Firewire 800. The destination is a sparse disk image bundle mounted locally (but its "source file" is on a network storage). Initially I got good speeds, 7-9 MB/s for reasonably large files but the longer this operation has been going on (I restarted it three days ago), the slower it gets. There are also long pauses when nothing happens, like this:

2011-01-22-070305/Macintosh HD/Library/Application Support/Apple/Mail/Stationery/Apple/Contents/Resources/Photos/Contents/Resources/Bamboo.mailstationery/Contents/Resources/Mask3.png
          1.28K 100%    3.26kB/s    0:00:00 (xfr#48406, ir-chk=1050/4166332)
2016/01/16 18:26:48 Volumes/src/Backups.backupdb/mm/2011-01-22-070305/Macintosh HD/Library/Application Support/Apple/Mail/Stationery/Apple/Contents/Resources/Photos/Contents/Resources/Bamboo.mailstationery/Contents/Resources/Mask3.png 313
2011-01-22-070305/Macintosh HD/Library/Application Support/Apple/Mail/Stationery/Apple/Contents/Resources/Photos/Contents/Resources/Bamboo.mailstationery/Contents/Resources/banner-green.jpg
         32.26K 100%    0.00kB/s    0:00:00 (xfr#48407, ir-chk=1049/4166332)
2016/01/16 19:17:37 Volumes/2TB/Backups.backupdb/mm/2011-01-22-070305/Macintosh HD/Library/Application Support/Apple/Mail/Stationery/Apple/Contents/Resources/Photos/Contents/Resources/Bamboo.mailstationery/Contents/Resources/banner-green.jpg 31279

(I couldn't bold the timestamps but as you can see, the first file is finished 18:26, the second file 19:17, and the second file is just 32 kB)

I don't think the transfer is CPU limited. There are some CPU spikes but generally CPU load is less than 10%. The three rsync processes spawned by this operation has, all in all, used almost exactly 5h of CPU time in the 72h the transfer has been going on. The computer itself idles 23h a day.

Nor is memory a problem. Memory pressure has been "green" since the operation begun.

Kernel task has accumulated quite a bit of CPU time (57h when I write this), but on the other hand, the uptime is 25 days and all these 57h can't have been consumed by rsync.

Some final details

  • I had had this process running for a couple of days when I restarted it to get better logging three days ago. It took nine hours before the first file was transferred.
  • I first used Finder to transfer this directory tree from the same source to the same destination. That took 3 days, all in all. Now I have spent 6 days and I don't think I even have transferred a third of the tree.
  • I have tried transferring files between the same source and destination outside of this operation and they go at full speed.

Best Answer

The destination is a sparse disk image bundle mounted locally (but its "source file" is on a network storage).

This is your problem. You're seeing the lame performance of whatever protocol backhauls the data between your local machine and your network storage (e.g. SMB, AFP, NFS, etc.) A common pitfall.

Rsync needs to read each file (if datetimes differ) to see which bits to send. In your situation, your filesystem is pulling the entire file from your network storage to your local Mac before rsync can read it. There's your slowdown.

N.B. Kudos for being so clear about that network-backing.

Related Question