How does the –fuzzy option for rsync work

rsync

How does rsync --fuzzy work? I do not get the results I expect.

From the manual:

This option tells rsync that it should look for a basis file for any destination file that is missing. The current algorithm looks in the
same directory as the destination file for either a file that has an identical size and modified-time, or a similarly-named file. If found,
rsync uses the fuzzy basis file to try to speed up the transfer.

If the option is repeated, the fuzzy scan will also be done in any matching alternate destination directories that are specified via –compare-dest,
–copy-dest, or –link-dest.

Note that the use of the –delete option might get rid of any potential fuzzy-match files, so either use –delete-after or specify some
filename exclusions if you need to prevent this.

Thus I expect the following shell script to rename the file destination/a1 to destination/a2 on the second rsync run. However as I interpret the output this is not what is happening (Matched data: 0 bytes).

#! /usr/bin/env bash
set -e

cd $(mktemp -d)
mkdir source destination
cat /dev/urandom | head --bytes=1M > source/a1
rsync --recursive --times $(pwd)/source/ $(pwd)/destination/
tree
mv source/a1 source/a2
rsync \
    --verbose \
    --recursive \
    --times \
    --delete \
    --delete-after \
    --fuzzy \
    --human-readable \
    --itemize-changes \
    --stats \
    $(pwd)/source/ \
    $(pwd)/destination/
tree
rm -r source destination

Output:

├── destination
│   └── a1
└── source
    └── a1

2 directories, 2 files
building file list ... done
>f+++++++++ a2
*deleting   a1

Number of files: 2 (reg: 1, dir: 1)
Number of created files: 1 (reg: 1)
Number of deleted files: 1 (reg: 1)
Number of regular files transferred: 1
Total file size: 1.05M bytes
Total transferred file size: 1.05M bytes
Literal data: 1.05M bytes
Matched data: 0 bytes
File list size: 0
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 1.05M
Total bytes received: 34

sent 1.05M bytes  received 34 bytes  2.10M bytes/sec
total size is 1.05M  speedup is 1.00
.
├── destination
│   └── a2
└── source
    └── a2

2 directories, 2 files

Output of rsync --version:

rsync  version 3.1.2  protocol version 31
Copyright (C) 1996-2015 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
    64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
    socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
    append, ACLs, xattrs, iconv, symtimes, prealloc

rsync comes with ABSOLUTELY NO WARRANTY.  This is free software, and you
are welcome to redistribute it under certain conditions.  See the GNU
General Public Licence for details.

How does rsync --fuzzy work?

Why do I not get the results I expect?

Best Answer

You're using rsync to copy files between two local file trees. The incremental algorithm, and all its associated optimisations such as --fuzzy, are ignored in this mode.

Repeat your test with a local file being copied to a remote server (or remote to local; it doesn't matter) and you'll find it works as expected.

As an example, modify your script in both places such as $(pwd)/destination is changed to localhost:$(pwd)/destination. It's not elegant but it will suffice.

# Set up PKI for localhost
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
ssh localhost id

Script results from the second rsync:

building file list ... done
<f+++++++++ a2
*deleting   a1

Number of files: 2 (reg: 1, dir: 1)
Number of created files: 1 (reg: 1)
Number of deleted files: 1 (reg: 1)
Number of regular files transferred: 1
Total file size: 1.05M bytes
Total transferred file size: 1.05M bytes
Literal data: 0 bytes
Matched data: 1.05M bytes
File list size: 0
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 4.20K
Total bytes received: 6.18K

sent 4.20K bytes  received 6.18K bytes  20.75K bytes/sec
total size is 1.05M  speedup is 101.09
Related Question