How to *efficiently* be determined whether a ZFS dataset has been changed since the last snapshot as a simple yes/no-question

snapshotzfs

Say I have a pool named tank and a dataset inside of it named data. There is also at least one snapshot of data named last_snapshot:

tank
tank/data
tank/data@last_snapshot

The "slow" way of figuring out whether the dataset has been changed is to check the output of diff, as far as I have figured it out:

zfs diff tank/data@last_snapshot

It will show all changes to the dataset for every individual file / folder … since the last snapshot. If there were a lot of changes, this command produces a lot of output and runs for seconds or even a few minutes.

A faster yet (as far as I can tell) less reliable way is to look at the written property of the dataset:

tank/data written 72K -
tank/data@last_snapshot written 1,83G -

If the value of written for the dataset (NOT any of the snapshots) is somewhere inbetween 56 and 128k, the dataset usually has not been modified since the last / most recent snapshot. This is much faster, but without understanding why this number can vary so much, I do not want to rely on this method.

How can I safely and quickly ask the yes/no-question: Has a dataset been changed / modified since the last snapshot?


Design idea (background): I do have a fair number of datasets which receive "significant" changes very infrequently only, say once every couple of weeks. Making a snapshot once per day and keeping the last, say, 7 quickly leads to loosing the state before the last "significant" change. This is where I thought: Let's only make snapshots if there was a "significant" change and keep the last 7 of those.

Best Answer

This is a good question, and the most I can do right now is upvote it.

In thinking about this, some avenues of conjecture I have attempted are:

  • snapshotting the dataset again, and comparing the createtxg ("birth") properties of the two snapshots. On one test I got lucky and had consecutive values (implying no transactions had occurred in the interim between snapA creation and snapB creation). This fact later proved worthless because further research tells me that transaction groups are numbered globally across the entire pool, not uniquely within each file system.

  • comparing the createtxg value of the snapshot to the most recent transaction group of the filesystem. I am not certain, but it may be that if the snapshot's createtxg is equal to or higher than the filesystem's most recent transaction group, one can infer that the snapshot creation itself is the latest transaction that has been committed to that filesystem. I am not certain whether this is true or not, but it would be fortuitous if the snapshot's createtxg could be compared to the filesystem's most recent txg to precisely determine whether the snapshot was "pristine."

At this time, the best answer I can offer is to trust the written property of the filesystem, but only within strict parameters:

1) Ensure that all pending transactions have been committed to the filesystem

sync; sleep 1; sync  # maybe I'm just being superstitious?

2) Make sure the filesystem is inactive (not mounted)

zfs umount tank/data

3) Query the exact machine-parseable value of the written attribute using the -p flag of zfs list

zfs list -Hpo written tank/data

That number needs to be exactly 0 before I would infer that the snapshot is pristine. "Small" is not good enough, if we take the zfs man page at face value:

The following native properties consist of read-only statistics about the dataset. These properties can be neither set, nor inherited. Native properties apply to all dataset types unless otherwise noted.
...

createtxg

The transaction group (txg) in which the dataset was created. Bookmarks have the same createtxg as the snapshot they are initially tied to. This property is suitable for ordering a list of snapshots, e.g. for incremental send and receive.
...

written

The amount of referenced space written to this dataset since the previous snapshot.

Further promise is offered by:

written@snapshot

The amount of referenced space written to this dataset since the specified snapshot. This is the space that is referenced by this dataset but was not referenced by the specified snapshot.

The filesystem needs to be unmounted so that there is no race condition between when you notice that the value is 0 and begin to act on that state, yet meanwhile another process is just about to write to the filesystem and spoil your party.

If anyone knows how to query the most recently-committed transaction group number of a filesystem, I'd be obliged to know that method. A few basic Google searches haven't turned up anything I have found useful.

Another interesting property is the referenced property of the snapshot versus that of its parent filesystem.

referenced

The amount of data that is accessible by this dataset, which may or may not be shared with other datasets in the pool. When a snapshot or clone is created, it initially references the same amount of space as the file system or snapshot it was created from, since its contents are identical.

However, my hunch is that this property also leads to false hope, because the filesystem and the snapshot might both reference 10G of data, but it's not the same 10G in the actual blocks on disk, as is implied by the phrase "may or may not be shared". Matching referenced values seems more like a necessary condition than a sufficient one.

In summary, and with much humility and uncertainty, I think one has to insist on this zfs list output before one can assume a snapshot is pristine:

# zfs create -o mountpoint=/root/test w541/test
# zfs snap w541/test@snap1
# zfs list -po written,written@snap1 w541/test
WRITTEN  WRITTEN@SNAP1
      0              0

Soil the snapshot and it is no longer pristine:

# touch test/foo
# zfs list -po written,written@snap1 w541/test
WRITTEN  WRITTEN@SNAP1
  57344          57344

Roll it back, and it becomes pristine again:

# zfs rollback w541/test@snap1
# zfs list -po written,written@snap1 w541/test
WRITTEN  WRITTEN@SNAP1
      0              0

But for highest integrity, the filesystem should be unmounted, perhaps for some period of time, before querying the written property.

I will be extremely grateful for any corrections to any mistaken assumptions, oversights, or additional insights that anyone may be kind enough to offer.

Related Question