Linux – Migrate LVM to ZFS

linuxlvmmigrationzfs

I am currently thinking about moving away from my LVM setup to a ZFS setup with RAIDZ.
My current setup consists of 5 x 4TB disks in a LVM and a 60GB SSD for the system.
At the moment there is about 6TB used on the LVM.

Is there a way, without completely backing up the 6TB, to migrate the LVM to a RAIDZ setup with 4 + 1 drives?
Kind regards

Lukas Häfliger

Edit: I did some research and the main problem I encountered was, that when creating a RAIDZ you need to have all disks available. The only possibility I saw was to create a RAIDZ with 3 HDD's, then moving all data from the LVM into this raid and then extending the ZFS pool with 2 single vdevs. Big disadvantage is that the 2 last HDD's are not "protected" by the raid system.

Best Answer

The short answer is no - there is no direct migration path for LVM -> ZFS. Your research leads you down the right path - you'll need 3 additional drives, create the zpool with those drives, use something like rsync to copy the data over, then add the existing drives to the zpool. If you're concerned about data protection, you could put the two remaining drives in a mirror, so you'd have a pool with a raidz and a mirror. This isn't really ideal, and does not get you to the 4+1 RAIDZ config, but it does at least utilize the disks.

If you really want the 4+1 config, then you'll probably need to find a way to attach a couple of the existing disk drives to the system externally - USB or something like that - then add a couple more 4TB drives to the system, build the pool, and migrate the data using rsync.

Also, with LVM, be careful that you know exactly which disks your data is located on before you start removing them, and make sure you use the commands in the right order. LVM has 'pvmove' to make it easy to dynamically move data off of disks and onto others, so as long as you're using that you should be good.

Related Solutions

A ZFS or LVM or MD redundant heterogeneous storage proposal

I have tested this with ZFS and write performance is about half what it should be, because ZFS distributes reads and writes over all vdevs (therefore dividing I/O to several places on the same disk). Thus, the speed is limited by the speed of the disk with the most partitions. Read speed seems to be equal to the disk bandwidth. Note a pair of ZFS partitions on two disks has roughly double the read speed of either single disk, because it can read from the disks in parallel.

Using MD LINEAR arrays or LVM to create the two halves results in twice the write performance compared to the above ZFS proposal, but has the disadvantage that LVM and MD have no idea where the data is stored. In the event of a disk failure or upgrade, one side of the array must be entirely destroyed and resynced/reslivered, followed by the other side. (e.g. the resync/resliver has to copy 2*(size of array))

Therefore it seems then that the optimal solution is to create a single ZFS mirror vdev across two LVM or MD LINEAR devices which combine the disks into equal-sized "halves". This has roughly twice the read bandwidth of any one disk, and write bandwidth is equal to the individual disk bandwidths.

Using BTRFS raid1 instead of ZFS also works, but has half the read bandwidth because ZFS distributes its reads to double the bandwidth, while it appears BTRFS does not (according to my tests). BTRFS has the advantage that partitions can be shrunk, while they cannot with ZFS (so if after a failure you have lots of empty space, with BTRFS it's possible to rebuild a smaller redundant array by shrinking the filesystem, then rearranging the disks).

This is tedious to do by hand but easy with some good scripts.

Growing a ZFS pool from populated ext4 disks

You can add devices to a pool after it has been created, however not really in the way you seem to envision.

With ZFS, the only redundant configuration that you can add devices to is the mirror. It is currently not possible to grow a raidzN vdev with additional devices after it has been created. Adding devices to a mirror increases the redundancy but not the available storage capacity.

It is possible to work around this to some degree, by creating a raidzN vdev of the desired configuration using sparse files for the redundancy devices, then deleting the sparse files before populating the vdev with data. Once you have drives available, you would zpool replace the (now non-existent) sparse files with those. The problem with using this approach as more than a migration path toward a more ideal solution is that the pool will constantly show as DEGRADED which means you have to look much more closely to recognize any actual degredation of the storage; hence, I don't really recommend it as a permanent solution.

Naiively adding devices to a ZFS pool actually comes at a serious risk of decreasing the pool's resilience to failure, because all top-level vdevs must be functional in order for the pool to be functional. These top-level vdevs can have redundant configurations, but do not need to; it is perfectly possible to run ZFS in a JBOD-style configuration, in which case a single device failure is highly likely to bring down your pool. (Bad idea if you can avoid it, but still gives you many ZFS capabilities even in a single-drive setup.) Basically, a redundant ZFS pool is made up of a JBOD combination of one or more redundant vdevs; a non-redundant ZFS pool is made up of a JBOD combination of one or more JBOD vdevs.

Adding top-level vdevs also doesn't cause ZFS to balance the data onto the new devices; it eventually happens for data that gets rewritten (because of the file system's copy-on-write nature and favoring vdevs with more free space), but it doesn't happen for data that just sits there and is read but never rewritten. You can make it happen by rewriting the data (for example through use of zfs send | zfs recv, assuming deduplication is not turned on for the pool) but it does require you to take specific action.

Based on the numbers in your post, you have:

4 × 2TB drives
2 × 4TB drives
approximately 8 TB of data

Since you say that you want a redundant configuration, given these constraints (particularly the set of drives available) I'd probably suggest grouping the drives as mirror pairs. That would give you a pool layout like this:

tank
- mirror-0
  - 2TBHDD1
  - 2TBHDD2
- mirror-1
  - 2TBHDD3
  - 2TBHDD4
- mirror-2
  - 4TBHDD1
  - 4TBHDD2

This setup will have a user-accessible storage capacity of approximately 8 TB, give or take metadata overhead (you have two mirrors providing 2 TB each, plus one mirror providing 4 TB, for a total of 8 TB). You can add more mirror pairs later to increase the pool capacity, or replace a pair of 2 TB drives with 4 TB drives (though be aware that resilvering in case of a drive failure in a mirror pair puts severe stress on the remaining drive(s), in the case of two-way mirrors greatly increasing the risk of complete failure of the mirror). The downside of this configuration is that the pool will be practically full right from the beginning, and the general suggestion is to keep ZFS pools below about 75% full. If your data is mostly only ever read, you can get away with less margin, but performance will suffer greatly particularly on writes. If your dataset is write-heavy, you definitely want some margin for the block allocator to work with. So this configuration will "work", for some definition of the word, but will be suboptimal.

Since you can freely add additional mirror devices to a vdev, with some planning it should be possible to do this in such a way that you don't lose any of your data.

You could in principle replace mirror-0 and mirror-1 above with a single raidz1 vdev eventually made up of the four 2 TB HDDs (giving you 6 TB usable storage capacity rather than 4 TB, and the ability to survive any one 2 TB HDD failure before your data is at risk), but that means committing three of those drives initially to ZFS. Given your usage figures it sounds like this might be possible with some shuffling data around. I wouldn't recommend mixing vdevs of different redundancy levels, though, and I think the tools even force you in that case to say effectively "yes, I really know what I'm doing".

Mixing different sized drives in a pool (and especially in a single vdev, except as a migration path to larger-capacity drives) is not really recommended; in both mirror and raidzN vdev configurations, the smallest constituent drive in the vdev determines the vdev capacity. Mixing vdevs of different capacity is doable but will lead to an unbalanced storage setup; however, if most of your data is rarely read, and when read is read sequentially, the latter should not present a major problem.

The best configuration would probably be to get an additional three 4 TB drives, then create a pool made up of a single raidz2 vdev made up of those five 4 TB drives, and effectively retire the 2 TB drives. Five 4 TB drives in raidz2 will give you 12 TB of storage capacity (leaving a good bit of room to grow) and raidz2 gives you the ability to survive the failure of any two of those drives, leaving the mirror setup in the dust in terms of resilience to disk problems. With some planning and data shuffling, it should be easy to migrate to such a setup with no data loss. Five drive raidz2 is also near optimal in terms of storage overhead according to tests performed by one user and published on the ZFS On Linux discussion list back in late April, showing a usable storage capacity at 96.4% of optimal when using 1 TB devices, beaten only by a six-drives-per-vdev configuration which gave 97.3% in the same test.

I do realize that five 4 TB drives might not be practical in a home setting, but keep in mind that ZFS is an enterprise file system, and many of its limitations (particularly in this case, the limitations on growing redundant vdevs after creation) reflect that.

And always remember, no type of RAID is backup. You need both to be reasonably secure against data loss.

Best Answer

Related Solutions

A ZFS or LVM or MD redundant heterogeneous storage proposal

Growing a ZFS pool from populated ext4 disks

Related Question