Backup ZFS pool using rsync

backupfreenasrsynczfs

I currently have a FreeNAS box for storing my personal files. I'd like to have an offsite backup, but I'm not willing to spend the money for a second computer capable of running ZFS properly. Therefore I was planning to take the remote backups using rsync.

I would like all the files in the backup to be consistent, which I thought I could do by taking a recursive snapshot first and then transferring that using rsync. It turns out however that a separate snapshot is taken for each dataset.

Now I'm wondering if there is any way to view a recursive snapshot, including all the datasets, or whether there is some other recommended way to rsync an entire zpool. I don't think simply symlinking to the .zfs folders in the datasets will work as I'd like rsync to keep any symlinks that are present in the datasets themselves.

Based on the comments I received, I think some details on my desired configuration are in place. I'm looking to have a NAS at home that I can comfortably put data on, knowing that it's unlikely that I'll ever lose it. For me this means having multiple copies on-site, multiple copies offsite, an offline copy in case things go really bad, periodic snapshots of the data in case of accidental deletion and a means to prevent data errors (e.g. bit rot). The less likely the event is to occur, the more relaxed I am with not having multiple copies of the data after a catastrophe and the less I care about snapshots. Also I care about old data more than I care about new data as I usually have a copy on another device. Finally I should note most files do not get updated too often. Most of the transfers will be new files.

My previous setup was a set of two Raspberry Pi's with attached 4TB external hard drives. I lost trust in this strategy, but had the hardware readily available. After some research it seemed that the only way to prevent errors from sneaking in over time was to go with a checksumming file system such as ZFS combined with server grade components such as ECC RAM and a UPS. For my local copy I went this route. I use 2x4TB disks in mirror and make regular snapshots here.

This machine should cover all cases except for the offsite and offline backups. Since I most likely won't need these backups, I'm not willing to invest too much in it. I therefore figured I could go with the Raspberry Pi's and external disks I already had lying around. I could make it such that one of the disks is always offline, while the other is receiving the backups. Changing the disks at regular intervals would then allow me to have an offline backup of my older data.

The straightforward route would be to use zfs send and receive to two pools, one on each disk. The Raspberry Pi, combined with the USB connection to the hard drive, would however not provide zfs (or any filesystem for that matter) a very reliable environment to operate in. Therefore I'm expecting errors to occur fairly regularly in this setup. Since I'll only be using one disk, zfs would not have any reliable means to recover from failure.

That is the reason I would like to go with ext3 or ext4 combined with rsync. Sure, some bad bits might be written to disk. In case of metadata, there are tools to fix most of these issues. In case of data blocks, that would result in the loss of a single file. Also, the file could be recovered using rsync -c as that would find an incorrect checksum and would transfer the file again from the known-good copy on the local machine. Given the less than ideal hardware, this seems like the best solution possible.

That is my reasoning for using rsync, which led me to the original question of how to rsync a recursive zfs snapshot. If I did not address any of your advice please let me know as I am really open to alternatives. I just do not currently see how they provide any advantage for me.

Best Answer

You seem pretty set on using rsync and a RaspberryPi, so here's another answer with a bit of a brain dump that will hopefully help you come to a solution.


Now I'm wondering if there is any way to view a recursive snapshot, including all the datasets, or whether there is some other recommended way to rsync an entire zpool.

Not that I know of... I expect that the recommendations would be along the lines of my other answer.


If you were content with simply running rsync on the mounted ZFS pool, then you could either exclude the .zfs directories (if they're visible to you) using rsync --exclude='/.zfs/', or set the snapdir=hidden property.

This causes issues though, as each dataset can be mounted anywhere, and you probably don't want to miss any...


You'll want to manage snapshots, and will want to create a new snapshot for "now", back it up, and probably delete it afterwards. Taking this approach (rather than just using the "live" mounted filesystems) will give you a consistent backup of a point in time. It will also ensure that you don't backup any strange hierarchies or miss any filesystems that may be mounted elsewhere.

$ SNAPSHOT_NAME="rsync_$(date +%s)"
$ zfs snapshot -r ${ROOT}@${SNAPSHOT_NAME}
$ # do the backup...
$ zfs destroy -r ${ROOT}@${SNAPSHOT_NAME}

Next you'll need to get a full list of datasets that you'd like to back up by running zfs list -Hrt filesystem -o name ${ROOT}. For example I might like to backup my users tree, below is an example:

$ zfs list -Hrt filesystem -o name ell/users
ell/users
ell/users/attie
ell/users/attie/archive
ell/users/attie/dropbox
ell/users/attie/email
ell/users/attie/filing_cabinet
ell/users/attie/home
ell/users/attie/photos
ell/users/attie/junk
ell/users/nobody
ell/users/nobody/downloads
ell/users/nobody/home
ell/users/nobody/photos
ell/users/nobody/scans

This gives you a recursive list of the filesystems that you are interested in...

You may like to skip certain datasets though, and I'd recommend using a property to achieve this - for example rsync:sync=false would prevent syncing that dataset. This is the same approach that I've recently added to syncoid.

The fields below are separated by a tab character.

$ zfs list -Hrt filesystem -o name,rsync:sync ell/users
ell/users   -
ell/users/attie -
ell/users/attie/archive -
ell/users/attie/dropbox -
ell/users/attie/email   -
ell/users/attie/filing_cabinet  -
ell/users/attie/home    -
ell/users/attie/photos  -
ell/users/attie/junk    false
ell/users/nobody    -
ell/users/nobody/downloads  -
ell/users/nobody/home   -
ell/users/nobody/photos -
ell/users/nobody/scans  -

You also need to understand that because ZFS datasets can be mounted anywhere (as pointed out above), it is not really okay to think of them as they are presented in the VFS... They are separate entities, and you should handle them as such.

To achieve this, we'll flatten out the filesystem names by replacing any forward slash / with three underscores ___ (or some other delimiter that won't typically appear in a filesystem's name).

$ filesystem="ell/users/attie/archive"
$ echo "${filesystem//\//___}"
ell___users___attie___archive

This can all come together into a simple bash script... something like this:

NOTE: I've only briefly tested this... and there should be more error handling.

#!/bin/bash -eu

ROOT="${ZFS_ROOT}"
SNAPSHOT_NAME="rsync_$(date +%s)"
TMP_MNT="$(mktemp -d)"

RSYNC_TARGET="${REMOTE_USER}@${REMOTE_HOST}:${REMOTE_PATH}"

# take the sanpshots
zfs snapshot -r "${ROOT}"@"${SNAPSHOT_NAME}"

# push the changes... mounting each snapshot as we go
zfs list -Hrt filesystem -o name,rsync:sync "${ROOT}" \
    | while read filesystem sync; do
        [ "${sync}" == "false" ] && continue
        echo "Processing ${filesystem}..."

        # make a safe target for us to use... flattening out the ZFS hierarchy
        rsync_target="${RSYNC_TARGET}/${filesystem//\//___}"

        # mount, rsync, umount
        mount -t zfs -o ro "${filesystem}"@"${SNAPSHOT_NAME}" "${TMP_MNT}"
        rsync -avP --exclude="/.zfs/" "${TMP_MNT}/" "${rsync_target}"
        umount "${TMP_MNT}"
    done

# destroy the snapshots
zfs destroy -r "${ROOT}"@"${SNAPSHOT_NAME}"

# double check it's not mounted, and get rid of it
umount "${TMP_MNT}" 2>/dev/null || true
rm -rf "${TMP_MNT}"
Related Question