Linux – ZFS on multiple partitions on one disk

backuphard drivelinuxpartitioningzfs

I have a very large external drive that I want to use for backups. Part of the backups are for Windows partitions that need to be accessible from Windows, part are backups of some Linux partitons.

Since I can not exactly predict the storage sizes, I thought about creating several partitions, starting with one NTFS and one as ZFS. If I run out of disk space on ZFS, I would simply add another of the spare partitions to the ZFS volume. If NTFS needs more space, I would resize it (or if not possible recreate, but that means copying all data again).

  • Is this setup recommendable or even possible?
  • Is there a better method to make the used disk space somewhat flexbile?
  • does this end in seek hell, or can the ZFS volume consist of multiple partitions on one disk without degrading performance (without spare copies)?
  • Alternate solutions to the problem?

UPDATE: I mounted the complete disk as encrypted loopback volume using cryptsetup. I then created a huge ZFS filesystem on it. If I set copies=2, does that end in seek-hell as well, or is there some clever ZFS mechanism that will store copies of all files on the same disk using a buffer [assumption: seek hell as well + only one copy because you would need several devices for several copies].

Best Answer

Is this setup recommendable or even possible?

It is technically possible. I'm not sure what the limits are to growing a NTFS file system, but I suspect you can grow the NTFS file system from the beginning of the disk inwards, and I know that you can grow a ZFS pool by adding arbitrary partitions so you could grow that from the end of the disk inwards. Doing so will cause the two to meet somewhere in the middle.

Recommended, though? I would say absolutely not!

does this end in seek hell, or can the ZFS volume consist of multiple partitions on one disk without degrading performance (without spare copies)?

ZFS is seek-heavy to begin with, because of its copy-on-write design. (This is exacerbated if you use snapshots. Snapshots happen to be a ZFS feature that lots of people really like, even if they don't really use any of ZFS' other features, but they come at a significant cost in terms of risk of data fragmentation, particularly if you make many small writes.)

When you add additional storage to an existing pool, which you would with your proposed scheme, the data already stored is not automatically rebalanced. Instead, ZFS writes new data blocks to vdevs that have the most free space, which causes the data to be distributed across all vdevs over time as it is written. In your case, this means that data will be distributed roughly as you expect: newly written data, including changes to existing files, is more likely to be written to the newly added vdev than to the previously existing vdev(s). Since all vdevs are on the same physical disk, there is no real reliability gain from this; if the single disk dies, then your pool dies, and takes all your data with it.

ZFS also tries to distribute the data evenly within the vdev, on the assumption that this reduces the risk of localized physical damage or logical errors affecting all copies equally. This is one driving reason behind its Merkle tree on-disk design, and is a likely reason why it tries to place copies as far apart as possible, preferably on unrelated vdevs.

There is, as far as I know, currently no native support in ZFS to rebalance the data between vdevs after adding additional storage. Btrfs has the btrfs rebalance command, but ZFS has nothing similar. You can get somewhat close by copying the data (using zfs send ... | zfs recv ...) but that's only a side effect.

Bottom line, I doubt your proposed setup would be significantly worse in terms of disk seeks than a similarly set up ZFS on a single partition.

I then created a huge ZFS filesystem on it. If I set copies=2, does that end in seek-hell as well, or is there some clever ZFS mechanism that will store copies of all files on the same disk using a buffer [assumption: seek hell as well + only one copy because you would need several devices for several copies].

First, please keep in mind the difference between ZFS pools and ZFS file systems. You created a pool, which by default contains a single file system with the same name as the pool. The pool dictates the properties of the physical storage, such as vdev configuration and minimum block size (known as ashift in ZFS), and is administered using the zpool utility. The file system dictates the properties of the logical storage, such as compression, quotas, mount points, and checksumming, and is administered using the zfs utility. This is in contrast to e.g. Btrfs, which lump the two together.

Second, let me briefly introduce you to how the copies ZFS file system property works. copies specifies the redundancy above and beyond the physical redundancy and is, in effect, similar to making several user-visible copies of a single file (but doesn't break the user's mental model if deduplication is in use on the file system). While it applies to all vdev redundancy types, this is easiest to illustrate with mirrors: A two-way mirror vdev will, by the simple property of being a mirror, store two identical copies of your data. If you have also set copies=2, then each of the two disks in the mirror pair will also hold two copies of your data, for a total of four copies of the bits being stored and a grand total of about 25% usable storage (when compared to the amount of raw storage available). The simple explanation breaks down somewhat with raidzN vdevs, but the result is the same: multiple copies of the same bits being stored on disk, such that in case one gets bad, another can be used.

By default, a single copy of user data is stored, and two copies of file system metadata is stored. By increasing copies, you adjust this behavior such that copies copies of user data (within that file system) is stored, and copies plus one copies of system metadata (within that file system) is stored. For best effect, if you want to set copies to a value greater than one, you should do so when you create the pool using zpool create -O copies=N, to ensure that additional copies of all root file system metadata is stored.

Under normal read operations, extra copies only consume storage space. When a read error occurs, if there are redundant, valid copies of the data, those redundant copies can be used as an alternative in order to satisfy the read request and transparently rewrite the broken copy. Read errors can be either outright I/O errors, or if checksumming is turned on (which it is by default, and which you really should leave that way unless you have some really unusual workload), data coming back from the disk as something other than what was intended to be written originally (a checksum mismatch).

However, during writes, all copies must be updated to ensure that they are kept in sync. Because ZFS aims to place copies far away from each other, this introduces additional seeking. Also don't forget about its Merkle tree design, with metadata blocks placed some physical distance away from the data blocks (to guard against for example a single write failure corrupting both the checksum and the data). I believe that ZFS aims to place copies at least 1/8 of the vdev away from each other, and the metadata block containing the checksum for a data block is always placed some distance away from the data block.

Consequently, setting copies greater than 1 does not significantly help or hurt performance while reading, but reduces performance while writing in relation to the number of copies requested and the IOPS (I/O operations per second) performance of the underlying storage.

Related Question