Linux – Different hash value of large rsynced file on centos and ubuntu

hashsumrsyncselinux

I rsynced a large file from remote centos to local ubuntu with

rsync -avzP user@<remote-ip>:/path/to/file .

It reported the transfer went well:

sent 30 bytes  received 257,293,476 bytes  1,296,188.95 bytes/sec
total size is 8,217,194,015  speedup is 31.94

As far as I know rsync automatically verifies the transfer went well with hash checks after the transfer is completed.

Out of curiosity I computed md5 hashes on centos and ubuntu, and these are different:

centos: 0faa300b7b0b81bfe65199da932eb6e2
ubuntu: f3a0fcc59516d4e68fd207bdbb1fc169

Both hashes are computed with md5sum:

centos> md5sum --version
md5sum (GNU coreutils) 8.22

ubuntu> md5sum --version
md5sum (GNU coreutils) 8.25

So the verisons are a little different, but can that lead to a different values of the hashes?

Edit:

Here are ls -l output:

centos: -rw-rw-r--.  1 username username 8217194015
ubuntu: -rw-rw-r--   1 username username 8217194015

Centos output includes mysterious dot I've never heard about. (could it be related to lvm? lvm is used on that centos)

Edit 2:

Checking md5sum -b leads to different results as well:

centos: 0faa300b7b0b81bfe65199da932eb6e2
ubuntu: 6d799f6981066d82c7f861576b4980e1

What hash algorithm does rsync use? According to wikipedia rsync uses md5 to check if the chunk is the same:

The recipient splits its copy of the file into chunks and computes two checksums for each chunk: the MD5 hash, and a weaker but easier to compute 'rolling checksum'. It sends these checksums to the sender. The sender quickly computes the rolling checksum for each chunk in its version of the file; if they differ, it must be sent. If they're the same, the sender uses the more computationally expensive MD5 hash to verify the chunks are the same.

Best Answer

There's a wrong assumption here:

As far as I know rsync automatically verifies the transfer went well with hash checks after the transfer is completed.

Rsync uses checksums to determine if a sync is needed. But, Rsync does not reread the created copy, it trust the kernel to report errors. So, the conclusion is simple: the files are not identical. Could be just one bit, could be more. How much mismatch there is, a checksum doesn't tell you.

Related Question