When you send the same set of files, rsync
is better suited because it will only send differences. tar
will always send everything and this is a waste of resources when a lot of the data are already there. The tar + rsync + untar
loses this advantage in this case, as well as the advantage of keeping the folders in-sync with rsync --delete
.
If you copy the files for the first time, first packeting, then sending, then unpacking (AFAIK rsync
doesn't take piped input) is cumbersome and always worse than just rsyncing, because rsync
won't have to do any task more than tar
anyway.
Tip: rsync version 3 or later does incremental recursion, meaning it starts copying almost immediately before it counts all files.
Tip2: If you use rsync
over ssh
, you may also use either tar+ssh
tar -C /src/dir -jcf - ./ | ssh user@server 'tar -C /dest/dir -jxf -'
or just scp
scp -Cr srcdir user@server:destdir
General rule, keep it simple.
UPDATE:
I've created 59M demo data
mkdir tmp; cd tmp
for i in {1..5000}; do dd if=/dev/urandom of=file$i count=1 bs=10k; done
and tested several times the file transfer to a remote server (not in the same lan), using both methods
time rsync -r tmp server:tmp2
real 0m11.520s
user 0m0.940s
sys 0m0.472s
time (tar cf demo.tar tmp; rsync demo.tar server: ; ssh server 'tar xf demo.tar; rm demo.tar'; rm demo.tar)
real 0m15.026s
user 0m0.944s
sys 0m0.700s
while keeping separate logs from the ssh traffic packets sent
wc -l rsync.log rsync+tar.log
36730 rsync.log
37962 rsync+tar.log
74692 total
In this case, I can't see any advantage in less network traffic by using rsync+tar, which is expected when the default mtu is 1500 and while the files are 10k size. rsync+tar had more traffic generated, was slower for 2-3 seconds and left two garbage files that had to be cleaned up.
I did the same tests on two machines on the same lan, and there the rsync+tar did much better times and much much less network traffic. I assume cause of jumbo frames.
Maybe rsync+tar would be better than just rsync on a much larger data set. But frankly I don't think it's worth the trouble, you need double space in each side for packing and unpacking, and there are a couple of other options as I've already mentioned above.
It looks like this was a bug in mke2fs
that caused it to use fallocate(fd, PUNCH_HOLE, ...)
instead of fallocate(fd, DISCARD_ZERO, ...)
when zeroing out the space in the inode tables (even when -E nodiscard
was used).
I submitted a bug report to the upstream linux-ext4@vger.kernel.org
mailing list after verifying this behaviour locally, and got a patch within an hour, subject:
e2fprogs: block zero/discard cleanups
They should be included into the e2fsprogs-1.45 release, and likely the 1.44.x maintenance release. If you want them in a vendor e2fsprogs
release, I'd recommend to patch+build your e2fsprogs to verify this is working for you, report success to linux-ext4
so that the patches will land sooner, then submit a bug report to your distro of choice so they pull the upstream patches into their releases.
Best Answer
bsdtar
(at least fromlibarchive
3.1.2) is able to detect sparse sections using theFS_IOC_FIEMAP
ioctl on the file systems that support it (though it supports a number of other APIs as well), however, at least in my test, strangely enough, it is not able to handle thetar
files it generates itself (looks like a bug though).However using GNU
tar
to extract them works, but then GNU tar can't handle some of the extended attributes that bsdtar supports.So
works as long as the files don't have extended attributes or flags.
It still doesn't work for files that are fully sparse (only zeros) as the
FS_IOC_FIEMAP
ioctl then returns 0 extent and it looks likebsdtar
doesn't handle that properly (another bug?).star
(Schily tar) is another opensource tar implementation that can detect sparse files (use the-sparse
option) and doesn't have those bugs ofbsdtar
(but is not packaged by many systems).