Why does tar appear to skip file contents when output file is /dev/null

nulltar

I have a directory with over 400 GiB of data in it. I wanted to check that all the files can be read without errors, so a simple way I thought of was to tar it into /dev/null. But instead I see the following behavior:

$ time tar cf /dev/null .

real    0m4.387s
user    0m3.462s
sys     0m0.185s
$ time tar cf - . > /dev/null

real    0m3.130s
user    0m3.091s
sys     0m0.035s
$ time tar cf - . | cat > /dev/null
^C

real    10m32.985s
user    0m1.942s
sys     0m33.764s

The third command above was forcibly stopped by Ctrl+C after having run for quite long already. Moreover, while the first two commands were working, activity indicator of the storage device containing . was nearly always idle. With the third command the indicator is constantly lit up, meaning extreme busyness.

So it seems that, when tar is able to find out that its output file is /dev/null, i.e. when /dev/null is directly opened to have the file handle which tar writes to, file body appears skipped. (Adding v option to tar does print all the files in the directory being tar'red.)

So I wonder, why is this so? Is it some kind of optimization? If yes, then why would tar even want to do such a dubious optimization for such a special case?

I'm using GNU tar 1.26 with glibc 2.27 on Linux 4.14.105 amd64.

Best Answer

It is a documented optimization:

When the archive is being created to /dev/null, GNU tar tries to minimize input and output operations. The Amanda backup system, when used with GNU tar, has an initial sizing pass which uses this feature.

Related Question