I often have large directories that I want to transfer to a local computer from a server. Instead of using recursive scp
or rsync
on the directory itself, I'll often tar
and gzip
it first and then transfer it.
Recently, I've wanted to check that this is actually working so I ran md5sum on two independently generated tar
and gzip
archives of the same source directory. To my suprise, the MD5 hash was different. I did this two more times and it was always a new value. Why am I seeing this result? Are two tar and gzipped directories both generated with the same version of GNU tar in the exact same way not supposed to be exactly the same?
For clarity, I have a source directory and a destination directory. In the destination directory I have dir1 and dir2. I'm running:
tar -zcvf /destination/dir1/source.tar.gz source && md5sum /destination/dir1/source.tar.gz >> md5.txt
tar -zcvf /destination/dir2/source.tar.gz source && md5sum /destination/dir2/source.tar.gz >> md5.txt
Each time I do this, I get a different result from md5sum. Tar produces no errors or warnings.
Best Answer
From the looks of things you’re probably being bitten by
gzip
timestamps; to avoid those, runNote that to get fully reproducible tarballs, you should also impose the sort order used by
tar
:If your version of
tar
doesn’t support--sort
, use this instead: