Compressing two identical folders give different result

filesreproducible-buildtar

I have two identical folders, with same structure and contents like this:

folder_1
  hello.txt
  subfolder
    byebye.txt

folder_2
  hello.txt
  subfolder
    byebye.txt

if I compress them as tar.xz formats I get two different archives with two different file sizes (just a few bytes, but they're not identical).

$ cd folder_1 && tar -Jcf archive.tar.xz *
$ cd folder_2 && tar -Jcf archive.tar.xz *

I get:

folder_1/archive.tar.xz != folder_2/archive.tar.xz

and of course if I md5sum or sha1sum them I'll get two different hashes

And that's my problem… I need to check if a provided archive is identical to the one I have in my storage. I cannot use hashing nor just check file sizes.

Using zip instead of tar.xz works as zip always produces identical achives from identical files.
Why is this happening? Is there a way to prevent it?

Best Answer

Ok, the explanation given by ddnomad is correct. It's about the timestamp.

Here is the solution:

add --mtime='1970-01-01 00:00:00' to tar command:

tar --mtime='1970-01-01 00:00:00' -Jcf archive.tar.xz *

This will force contents timestamp to a fixed value thus resulting in identical archives.

Related Question