Freebsd – way to create cow-copies in ZFS

freebsdzfs

I am trying to make cow-copies of some files/directories, but of the several ways I know of, all seem sub-optimal.

For example, btrfs can, with the use of cp --reflink=auto quickly generate cow-copies of files.

What I have tried:

  1. Symlinks: No good. Renamed file, broken link.
  2. Hardlinks: Better, but still no good. Changes to one file will change the other, and I don't necessarily want the other file changed.
  3. Create a snapshot of the dataset, then clone the snapshot: This can work, but not well. Often I'm not looking for a copy of the whole dataset, or for the copies to act like another dataset. Then there are the parent/child relationships between the clone/snapshot/original, which as I understand it are hard, if not impossible to break.
  4. Using zfs send/receive, and enabled dedup, replicate the dataset to a new dataset: This avoids the parent/child relationships of using a clone, but still needlessly creates another dataset, and still suffers from the slowness involved in the files having to be read 100% and the blocks referenced again instead of written.
  5. Copy files and let dedup do its job: This works, but is slow because the file(s) have to be 100% read and then the blocks referenced again instead of writing.

Slowness of zfs send/receive and physically copying or rsyncing is further exacerbated because most things are stored compressed, and have to be decompressed during read, then compressed before dedup kicks in to reference duplicate blocks.

In all of my research, I have not been able to find anything remotely resembling the simplicity of –reflink in btrfs.

So, is there a way to create cow-copies in ZFS? Or is "physically" copying and letting dedup do its job the only real option?

Best Answer

I think option 3 as you have described above is probably your best bet. The biggest problem with what you want is that ZFS really only handles this copy-on-write at the dataset/snapshot level.

I would strongly suggest avoiding using dedup unless you have verified that it works well with your exact environment. I have personal experience with dedup working great until one more user or VM store is moved in, and then it falls off a performance cliff and causes a lot of problems. Just because it looks like it's working great with your first ten users, your machine might fall over when you add the eleventh (or twelfth, or thirteenth, or whatever). If you want to go this route, make absolutely sure that you have a test environment that exactly mimics your production environment and that it works well in that environment.

Back to option 3, you'll need to set up a specific data set to hold each of the file system trees that you want to manage in this way. Once you've got it set up and initially populated, take your snapshots (one per dataset that will differ slightly) and promote then into clones. Never touch the original dataset again.

Yes, this solution has problems. I'm not saying it doesn't, but given the restrictions of ZFS, it's still probably the best one. I did find this reference to someone using clones effectively: http://thegreyblog.blogspot.com/2009/05/sparing-disk-space-with-zfs-clones.html

I'm not real familiar with btrfs, but if it supports the options that you want, have you considered setting up a separate server just to support these datasets, using Linux and btrfs on that server?

Related Question