I often find myself sending folders with 10K – 100K of files to a remote machine (within the same network on-campus).
I was just wondering if there are reasons to believe that,
tar + rsync + untar
Or simply
tar (from src to dest) + untar
could be faster in practice than
rsync
when transferring the files for the first time.
I am interested in an answer that addresses the above in two scenarios: using compression and not using it.
Update
I have just run some experiments moving 10,000 small files (total size = 50 MB), and tar+rsync+untar
was consistently faster than running rsync
directly (both without compression).
Best Answer
When you send the same set of files,
rsync
is better suited because it will only send differences.tar
will always send everything and this is a waste of resources when a lot of the data are already there. Thetar + rsync + untar
loses this advantage in this case, as well as the advantage of keeping the folders in-sync withrsync --delete
.If you copy the files for the first time, first packeting, then sending, then unpacking (AFAIK
rsync
doesn't take piped input) is cumbersome and always worse than just rsyncing, becausersync
won't have to do any task more thantar
anyway.Tip: rsync version 3 or later does incremental recursion, meaning it starts copying almost immediately before it counts all files.
Tip2: If you use
rsync
overssh
, you may also use eithertar+ssh
or just
scp
General rule, keep it simple.
UPDATE:
I've created 59M demo data
and tested several times the file transfer to a remote server (not in the same lan), using both methods
while keeping separate logs from the ssh traffic packets sent
In this case, I can't see any advantage in less network traffic by using rsync+tar, which is expected when the default mtu is 1500 and while the files are 10k size. rsync+tar had more traffic generated, was slower for 2-3 seconds and left two garbage files that had to be cleaned up.
I did the same tests on two machines on the same lan, and there the rsync+tar did much better times and much much less network traffic. I assume cause of jumbo frames.
Maybe rsync+tar would be better than just rsync on a much larger data set. But frankly I don't think it's worth the trouble, you need double space in each side for packing and unpacking, and there are a couple of other options as I've already mentioned above.