How to speed up operations on sparse files with tar, gzip, rsync

rsyncsparse-filestar

I have a sparse file. (du -h reports 3G and du -h --apparent-size reports 100G.) So far, so good.

Now, when I want to compress the file using tar or send it over the network using rsync, it will require as much time as 3G. It seems these tools read all the zeros.

I thought the holes are somehow marked and these tools could somehow just skip them?

There is likely no issue with my file?

Is this a missing feature in tar and rsync to not look for sparse files? I used the tar parameter --sparse, but that didn't speed up things. Neither did rsync parameter --sparse.

Is there any way to speed these tools up on sparse files?

Best Answer

bsdtar (at least from libarchive 3.1.2) is able to detect sparse sections using the FS_IOC_FIEMAP ioctl on the file systems that support it (though it supports a number of other APIs as well), however, at least in my test, strangely enough, it is not able to handle the tar files it generates itself (looks like a bug though).

However using GNU tar to extract them works, but then GNU tar can't handle some of the extended attributes that bsdtar supports.

So

bsdtar cf - sparse-files | (cd elsewhere && tar xpf -)

works as long as the files don't have extended attributes or flags.

It still doesn't work for files that are fully sparse (only zeros) as the FS_IOC_FIEMAP ioctl then returns 0 extent and it looks like bsdtar doesn't handle that properly (another bug?).

star (Schily tar) is another opensource tar implementation that can detect sparse files (use the -sparse option) and doesn't have those bugs of bsdtar (but is not packaged by many systems).

Related Question