This, like most things, is explained in rsync
's
extremely comprehensive man page (emphasis mine):
When the file transfer finishes, rsync replaces the progress line
with a summary line that looks like this:
1,238,099 100% 146.38kB/s 0:00:08 (xfr#5, to-chk=169/396)
In this example, the file was 1,238,099 bytes long in total, the
average rate of transfer for the whole file was 146.38 kilobytes per
second over the 8 seconds that it took to complete, it was the 5th
transfer of a regular file during the current rsync session, and there
are 169 more files for the receiver to check (to see if they are
up-to-date or not) remaining out of the 396 total files in the
file-list.
So, in your example, assembly/2. East Rutherford - English - Friday PM.mp4
was the second file to be transferred, and another 5 of a total of 8 files will need to be checked.
By default rsync compares files by size and timestamp, but a device does not have a size so it must calculate differences using the delta algorithm which is described in this tech report.
Loosely, the remote file is divided into blocks of a chosen size, and the checksums of these are sent back. The local file is similarly checksummed in blocks, and compared with the list. The remote is then told how to reassemble the blocks it has to remake the file, and data for the blocks that do not match is sent.
You can see this by asking for debug output at level 3 just for the deltasum algorithm with option --debug=deltasum3
. You can specify a block size with -B
to simplify the numbers. For example, for a file that has already been copied once, a second run of
rsync -B 100000 --copy-devices -avv --debug=deltasum3 --no-W /dev/sdd /tmp/mysdd
produces output like this showing the checksum for each block:
count=164 rem=84000 blength=100000 s2length=2 flength=16384000
chunk[0] offset=0 len=100000 sum1=61f6893e
chunk[1] offset=100000 len=100000 sum1=32f30ba3
chunk[2] offset=200000 len=100000 sum1=45b1f9e5
...
You can then see it matching the checksums of the other device fairly trivially, since there are no differences:
potential match at 0 i=0 sum=61f6893e
match at 0 last_match=0 j=0 len=100000 n=0
potential match at 100000 i=1 sum=32f30ba3
match at 100000 last_match=100000 j=1 len=100000 n=0
potential match at 200000 i=2 sum=45b1f9e5
match at 200000 last_match=200000 j=2 len=100000 n=0
...
At the end the data=
field is 0, showing no new data was sent.
total: matches=164 hash_hits=164 false_alarms=0 data=0
If we now corrupt the copy by overwriting the middle of the file:
echo test | dd conv=block,notrunc seek=80 bs=100000 of=/tmp/mysdd
touch -r /dev/sdd /tmp/mysdd
then the rsync debug shows us a new checksum for block 80 but no match for it. We go from match 79 to match 81:
chunk[80] offset=8000000 len=100000 sum1=a73cccfe
...
potential match at 7900000 i=79 sum=58eabec6
match at 7900000 last_match=7900000 j=79 len=100000 n=0
potential match at 8100000 i=81 sum=eba488ba
match at 8100000 last_match=8000000 j=81 len=100000 n=100000
At the end we have data=100000
showing that a whole new data block had to be sent.
total: matches=163 hash_hits=385 false_alarms=0 data=100000
The number of matches has been reduced by 1, for the corrupt block checksum which failed to match. Perhaps the hash hits rise because we lost sequential matching.
If we look further in the same tech report, some test results are shown and the false alarms
are described as "the number of times the 32 bit rolling checksum matched but the strong checksum did not". Each block has a simple checksum and an md5 checksum made (md4 in older versions). The simple checksum is easy to search for using a hash table as it is a 32 bit integer. Once it matches an entry, the longer 16 byte md5 checksum is also compared, and if it does not match it is a false alarm, and the search continues.
My example uses a very small (and old) usb key device of 16Mbytes, and the minimum hash table size is 2**16 i.e. 65536 entries, so it is pretty empty when holding the 164 chunk entries I have. So many false alarms are normal and more an indication of efficiency then anything else.
Best Answer
As per https://lists.samba.org/archive/rsync/2004-November/011057.html
false_alarms
just tells you how many blocks had a matching weak checksum that did not match when the strong checksum was compared.rsync only mentions this stat if debugging levels of verbosity are set