Why does the gzip version of files produce a different md5 checksum

gziphashsum

I have four files that I created using an svndump

test.svn 
test2.svn 
test.svn.gz  
test2.svn.gz

now when I run this

md5sum test2.svn test.svn test.svn.gz test2.svn.gz

Here is the output

89fc1d097345b0255825286d9b4d64c3  test2.svn
89fc1d097345b0255825286d9b4d64c3  test.svn
8284ebb8b4f860fbb3e03e63168b9c9e  test.svn.gz
ab9411efcb74a466ea8e6faea5c0af9d  test2.svn.gz

So I can't understand why gzip is compressing files differently is it putting a timestamp somewhere before compressing? I had a similar issue with mysqldump as it was using the date field on top

Best Answer

gzip stores some of the original file's metadata in record header, including the file modification time and filename, if available. See GZIP file format specification.

So it's expected that your two gzip files aren't identical. You can work around this by passing gzip the -n flag, which stops it from including the original filename and timestamp in the header.

Related Question