Linux – Using rsync on WSL (Bash for Windows 10) for incremental backups

backupbashrsyncwindows 10windows-subsystem-for-linux

I was an avid user of Linux for many years but switched to Windows. One thing that I have always missed is using rsync to make incremental snapshots for the purposes of backup. You use a command like this:

rsync -aP --link-dest=PATHTO/$PREVIOUSBACKUP $SOURCE $CURRENTBACKUP

and it will create a complete directory structure of the current state of the folder you are backing up, using hard links to reference the previous backup. Only new or changed files are actually written to disk; all the others are saved as hard links. This is just plain awesome, and none of the solutions I have found in Windows have been able to do this. (For reference, I've tried Windows File History, and paid money for commercial software including Crashplan and the otherwise brilliant Beyond Compare.)

Now that Microsoft have introduced WSL I have tried to do this in bash for Windows. It appears to have worked perfectly, but Windows is reporting the size on disk incorrectly, as shown here:

To explain, in one shapshot snapshot-170831__07h08m48s I added a large video file, 1.40 GB, which took several seconds to copy. In the next shapshot, snapshot-170831__07h09m09s the file took no time to copy, which suggests the hard link was created correctly and that no space has been taken up on my hard drive.

However all my Windows tools (Windows Explorer, Directory Opus, WinDirStat) report that the second snapshot folder is taking up 1.40 GB on disk.

The first question then is why is Windows incorrectly reporting the size on disk for these hard links?

But the more important question is should I be doing this: using Linux tools under Windows for my regular file backups. I've already seen on WSL that modifying files like .bashrc using Windows tools will make the file unreadable inside WSL. I wonder whether it is too risky to trust my backups to a tool like this, or whether I risk corrupting the file system and losing important files.

Best Answer

Calculating the size of hard links in Windows

It's difficult to calculate the size of hard-linked files in Windows. One tool that allows you to do this is TreeSize Professional (not free, analysis of hard links is switched off by default). I used this tool and it correctly estimated the size of the hard-linked files.

For a more thorough discussion, see How can I check the actual size used in an NTFS directory with many hardlinks?

Are the files actually working?

As for the other part of the question, is it risky to backup files using Linux tools under Windows on WSL? I decided to test this simply by copying one of the snapshot directories to a separate external hard drive. There were no problems copying the files, or reading them from the external drive. In other words, the hard links are behaving exactly as expected, and the files are working.

Long-term data stability

So down to the final point, could using Linux tools under WSL as part of my regular backups break something, such as corrupting the file system? Do I trust WSL not to break things majorly? Anything can break at any time, so I will make sure these snapshot directories get copied periodically to a separate drive.

Related Solutions

Windows – Rsync and Cygwin based backup on Windows gives permission denied errors

I managed to find a solution which currently works for me. I am unsure if this is a guaranteed correct approach.

In order to avoid permission issues, I need to add the following to my rsync command: --no-p --chmod=ugo=rwX. Thus, my rsync backup command should look like the following:

rsync -avhP --no-p --chmod=ugo=rwX --delete --log-file="/cygdrive/C/Users/MyUsername/rsync-backup-log.txt" "/cygdrive/C/Users/MyUsername/Folder A" "/cygdrive/E/Backup/"

Credit to this solution goes to the following answer in a similar post: https://superuser.com/a/69764/607501

Backup ZFS pool using rsync

You seem pretty set on using rsync and a RaspberryPi, so here's another answer with a bit of a brain dump that will hopefully help you come to a solution.

Now I'm wondering if there is any way to view a recursive snapshot, including all the datasets, or whether there is some other recommended way to rsync an entire zpool.

Not that I know of... I expect that the recommendations would be along the lines of my other answer.

If you were content with simply running rsync on the mounted ZFS pool, then you could either exclude the .zfs directories (if they're visible to you) using rsync --exclude='/.zfs/', or set the snapdir=hidden property.

This causes issues though, as each dataset can be mounted anywhere, and you probably don't want to miss any...

You'll want to manage snapshots, and will want to create a new snapshot for "now", back it up, and probably delete it afterwards. Taking this approach (rather than just using the "live" mounted filesystems) will give you a consistent backup of a point in time. It will also ensure that you don't backup any strange hierarchies or miss any filesystems that may be mounted elsewhere.

$ SNAPSHOT_NAME="rsync_$(date +%s)"
$ zfs snapshot -r ${ROOT}@${SNAPSHOT_NAME}
$ # do the backup...
$ zfs destroy -r ${ROOT}@${SNAPSHOT_NAME}

Next you'll need to get a full list of datasets that you'd like to back up by running zfs list -Hrt filesystem -o name ${ROOT}. For example I might like to backup my users tree, below is an example:

$ zfs list -Hrt filesystem -o name ell/users
ell/users
ell/users/attie
ell/users/attie/archive
ell/users/attie/dropbox
ell/users/attie/email
ell/users/attie/filing_cabinet
ell/users/attie/home
ell/users/attie/photos
ell/users/attie/junk
ell/users/nobody
ell/users/nobody/downloads
ell/users/nobody/home
ell/users/nobody/photos
ell/users/nobody/scans

This gives you a recursive list of the filesystems that you are interested in...

You may like to skip certain datasets though, and I'd recommend using a property to achieve this - for example rsync:sync=false would prevent syncing that dataset. This is the same approach that I've recently added to syncoid.

The fields below are separated by a tab character.

$ zfs list -Hrt filesystem -o name,rsync:sync ell/users
ell/users   -
ell/users/attie -
ell/users/attie/archive -
ell/users/attie/dropbox -
ell/users/attie/email   -
ell/users/attie/filing_cabinet  -
ell/users/attie/home    -
ell/users/attie/photos  -
ell/users/attie/junk    false
ell/users/nobody    -
ell/users/nobody/downloads  -
ell/users/nobody/home   -
ell/users/nobody/photos -
ell/users/nobody/scans  -

You also need to understand that because ZFS datasets can be mounted anywhere (as pointed out above), it is not really okay to think of them as they are presented in the VFS... They are separate entities, and you should handle them as such.

To achieve this, we'll flatten out the filesystem names by replacing any forward slash / with three underscores ___ (or some other delimiter that won't typically appear in a filesystem's name).

$ filesystem="ell/users/attie/archive"
$ echo "${filesystem//\//___}"
ell___users___attie___archive

This can all come together into a simple bash script... something like this:

NOTE: I've only briefly tested this... and there should be more error handling.

#!/bin/bash -eu

ROOT="${ZFS_ROOT}"
SNAPSHOT_NAME="rsync_$(date +%s)"
TMP_MNT="$(mktemp -d)"

RSYNC_TARGET="${REMOTE_USER}@${REMOTE_HOST}:${REMOTE_PATH}"

# take the sanpshots
zfs snapshot -r "${ROOT}"@"${SNAPSHOT_NAME}"

# push the changes... mounting each snapshot as we go
zfs list -Hrt filesystem -o name,rsync:sync "${ROOT}" \
    | while read filesystem sync; do
        [ "${sync}" == "false" ] && continue
        echo "Processing ${filesystem}..."

        # make a safe target for us to use... flattening out the ZFS hierarchy
        rsync_target="${RSYNC_TARGET}/${filesystem//\//___}"

        # mount, rsync, umount
        mount -t zfs -o ro "${filesystem}"@"${SNAPSHOT_NAME}" "${TMP_MNT}"
        rsync -avP --exclude="/.zfs/" "${TMP_MNT}/" "${rsync_target}"
        umount "${TMP_MNT}"
    done

# destroy the snapshots
zfs destroy -r "${ROOT}"@"${SNAPSHOT_NAME}"

# double check it's not mounted, and get rid of it
umount "${TMP_MNT}" 2>/dev/null || true
rm -rf "${TMP_MNT}"