How to one-way mirror an entire zfs pool to another zfs pool

replicationzfs

I have one zfs pool containing several zvols and datasets of which some are also nested.
All datasets and zvols are periodically snapshotted by zfs-auto-snapshot.
All datasets and zvols also have some manually created snapshots.

I have setup a remote pool on which due to lack of time, initial copying over local high speed network via zfs send -R did not complete (some datasets are missing, some datasets have outdated or missing snapshots).

Now the pool is physically remote over a slow speed connection and I need to periodically sync the remote pool with local pool, meaning data present in local pool must be copied to remote pool, data gone from local pool must be deleted from remote pool, and data present in remote pool but not in local pool must be deleted from remote pool, by data meaning 'zvols', 'datasets' or 'snapshots'.

If I was doing this between two regular filesystems using rsync, it would be "-axPHAX –delete" (that's what I actually do to backup some systems).

How do I setup a synchronizing task so the remote pool zvols & datasets (including their snapshots) can be in sync with local zvols,datasets&snapshots?

I would like to avoid transferring over ssh, because of low throughput performance of ssh; I'd prefer mbuffer or iscsi instead.

Best Answer

Disclaimer: As I've never used zvols, I cannot say if they are any different in replication than normal filesystems or snapshots. I assume they are, but do not take my word for it.


Your question is actually multiple questions, I try to answer them separately:

How to replicate/mirror complete pool to remote location

You need to split the task into two parts: first, the initial replication has to be complete, afterwards incremental replication is possible, as long as you do not mess with your replication snapshots. To enable incremental replication, you need to preserve the last replication snapshots, everything before that can be deleted. If you delete the previous snapshot, zfs recv will complain and abort the replication. In this case you have to start all over again, so try not to do this.

If you just need the correct options, they are:

  • zfs send:
    • -R: send everything under the given pool or dataset (recursive replication, needed all the time, includes -p). Also, when receiving, all deleted source snapshots are deleted on the destination.
    • -I: include all intermediate snapshots between the last replication snapshot and the current replication snapshot (needed only with incremental sends)
  • zfs recv:
    • -F: expand target pool, including deletion of existing datasets that are deleted on the source
    • -d: discard the name of the source pool and replace it with the destination pool name (the rest of the filesystem paths will be preserved, and if needed also created)
    • -u: do not mount filesystem on destination

If you prefer a complete example, here is a small script:

#!/bin/sh

# Setup/variables:

# Each snapshot name must be unique, timestamp is a good choice.
# You can also use Solaris date, but I don't know the correct syntax.
snapshot_string=DO_NOT_DELETE_remote_replication_
timestamp=$(/usr/gnu/bin/date '+%Y%m%d%H%M%S')
source_pool=tank
destination_pool=tank
new_snap="$source_pool"@"$snapshot_string""$timestamp"
destination_host=remotehostname

# Initial send:

# Create first recursive snapshot of the whole pool.
zfs snapshot -r "$new_snap"
# Initial replication via SSH.
zfs send -R "$new_snap" | ssh "$destination_host" zfs recv -Fdu "$destination_pool"

# Incremental sends:

# Get old snapshot name.
old_snap=$(zfs list -H -o name -t snapshot -r "$source_pool" | grep "$source_pool"@"$snapshot_string" | tail --lines=1)
# Create new recursive snapshot of the whole pool.
zfs snapshot -r "$new_snap"
# Incremental replication via SSH.
zfs send -R -I "$old_snap" "$new_snap" | ssh "$destination_host" zfs recv -Fdu "$destination_pool"
# Delete older snaps on the local source (grep -v inverts the selection)
delete_from=$(zfs list -H -o name -t snapshot -r "$source_pool" | grep "$snapshot_string" | grep -v "$timestamp")
for snap in $delete_from; do
    zfs destroy "$snap"
done

Use something faster than SSH

If you have a sufficiently secured connection, for example IPSec or OpenVPN tunnel and a separate VLAN that only exists between sender and receiver, you may switch from SSH to unencrypted alternatives like mbuffer as described here, or you could use SSH with weak/no encryption and disabled compression, which is detailed here. There also was a website about recomiling SSH to be much faster, but unfortunately I don't remember the URL - I'll edit it later if I find it.

For very large datasets and slow connections, it may also be useful to to the first transmission via hard disk (use encrypted disk to store zpool and transmit it in sealed package via courier, mail or in person). As the method of transmission does not matter for send/recv, you can pipe everything to the disk, export the pool, send the disk to its destination, import the pool and then transmit all incremental sends via SSH.

The problem with messed up snapshots

As stated earlier, if you delete/modify your replication snapshots, you will receive the error message

cannot send 'pool/fs@name': not an earlier snapshot from the same fs

which means either your command was wrong or you are in an inconsistent state where you must remove the snapshots and start all over.

This has several negative implications:

  1. You cannot delete a replication snapshot until the new replication snapshot was successfully transferred. As these replication snapshots include the state of all other (older) snapshots, empty space of deleted files and snapshots will only be reclaimed if the replication finishes. This may lead to temporary or permanent space problems on your pool which you can only fix by restarting or finishing the complete replication procedure.
  2. You will have many additional snapshots, which slows down the list command (except on Oracle Solaris 11, where this was fixed).
  3. You may need to protect the snapshots against (accidental) removal, except by the script itself.

There exists a possible solution to those problems, but I have not tried it myself. You could use zfs bookmark, a new feature in OpenSolaris/illumos created specifically for this task. This would free you of snapshot management. The only downside is that at present, it only works for single datasets, not recursively. You would have to save a list of all your old and new datasets and then loop over them, bookmarking, sending and receiving them, and then updating the list (or small database, if you prefer).

If you try the bookmark route, I would be interested to hear how it worked out for you!

Related Question