How to check if a directory contains the same files of a TAR archive

command linediff()tar

Let's say I have a folder Documents and a TAR file Documents.tar, how to check if the tar file contains the same files that are present in the directory?

The more obvious solution to me would be to do:

$ tar xvf Documents.tar -C untarDocs
$ diff -r Documents untarDocs

Unfortunately this is very slow for large TAR files, is there any other alternative?

Using tar -dvf Documents.tar (or –diff, –compare) doesn't work because it doesn't detect a file that is present in the filesystem but not in the TAR file, it just detects a file present in the TAR file but not in the filesystem e.g.:

$ mkdir new
$ touch new/foo{1..4}
$ tar cvf new.tar new/
$ touch new/bar
$ tar --diff --verbose --file=new.tar       #### doesn't detect new/bar #########
$ rm new/foo1
$ tar --diff --verbose --file=new.tar

Output

new/
new/foo2
new/foo3
new/foo4
new/foo1
tar: new/foo1: Warning: Cannot stat: No such file or directory   ### works ###

Best Answer

If you want only to compare lists of file- and directory-names, the -d option is not helpful. Instead, diff'ing sorted lists from find and tar -tf would do that.

Starting with the names assumed in OP's original example:

$ tar xvf Documents.tar -C untarDocs
$ diff -r Documents untarDocs

here is a suggested script to diff the filenames:

#!/bin/sh
MYDIR=$(mktemp -d)
tar tf Documents.tar |sort >$MYDIR/from-tar
find Documents |sort >$MYDIR/from-dir
(cd $MYDIR && diff -r from-tar from-dir)
rm -rf $MYDIR

This assumes that Documents.tar contains the same top-level "Documents" directory. If that is not a good assumption, then the lists should be filtered to remove the name of the top-level directory. OP did not indicate that this would be a problem, however.

In any case, the lists must be sorted, because there is no guarantee which order the tar and find programs use.

I used mktemp because of the clue that OP is using GNU tar (the -d option), which makes it likely on Linux.

There is of course no POSIX tar for reference with regard to -d. pax does not do diff's either.

Related Question