Files Tar Hashsum – Why Does Tar Produce Different Files Each Time?

fileshashsumtar

I often have large directories that I want to transfer to a local computer from a server. Instead of using recursive scp or rsync on the directory itself, I'll often tar and gzip it first and then transfer it.

Recently, I've wanted to check that this is actually working so I ran md5sum on two independently generated tar and gzip archives of the same source directory. To my suprise, the MD5 hash was different. I did this two more times and it was always a new value. Why am I seeing this result? Are two tar and gzipped directories both generated with the same version of GNU tar in the exact same way not supposed to be exactly the same?

For clarity, I have a source directory and a destination directory. In the destination directory I have dir1 and dir2. I'm running:

tar -zcvf /destination/dir1/source.tar.gz source && md5sum /destination/dir1/source.tar.gz >> md5.txt

tar -zcvf /destination/dir2/source.tar.gz source && md5sum /destination/dir2/source.tar.gz >> md5.txt

Each time I do this, I get a different result from md5sum. Tar produces no errors or warnings.

Best Answer

From the looks of things you’re probably being bitten by gzip timestamps; to avoid those, run

GZIP=-n tar -zcvf ...

Note that to get fully reproducible tarballs, you should also impose the sort order used by tar:

GZIP=-n tar --sort=name -zcvf ...

If your version of tar doesn’t support --sort, use this instead:

find source -print0 | LC_ALL=C sort -z | GZIP=-n tar --no-recursion --null -T - -zcvf ...
Related Question