Linux – Combining SSD + HDD into single fast, large partition

linuxzfs

So I have a desktop with a fast SSD and large HDD. I am trying to get a well configured large, fast zpool out of it.

I have read that I can carve separate partitions into the SSD for the ZIL and L2ARC which would seem to do what I want, except I have to manually configure how big each partition should be. What I don't like about it is that it's somewhat involved, potentially hard to reconfigure if I need to change the partitions, and it sounds like the maximum filesystem size is limited by the HDD alone since the intent is that everything on the ZIL and L2ARC has to also make it to disk, at least eventually. Also it's not clear if the L2ARC is retained after system reboot or if it has to be populated again. It also seems inefficient to have to copy data from ZIL to L2ARC if they are both on the same SSD, or even to HDD if there is currently no pressure on how much hot data I need on SSD.

Alternatively, it seems I can also just have 1 partition on SSD and 1 on HDD and add them to a zpool directly with no redundancy. I have tried this, and noticed sustained read/write speeds greater than what HDD alone can muster. But I don't know if everything is just going to the SSD for now, and everything will go to HDD later once SSD is all filled up. Ideally, I would like to have ZFS transparently shuffle the data around behind the scenes to try to always keep the hot data on the SSD similarly to what L2ARC, and have a sensible amount of empty space on SSD for new writes. The ZIL should be automatically managed to be the right size and preferably live on the SSD as much as possible.
If I go the manually configured ZIL + L2ARC route, it seems like the ZIL only needs to be about (10 sec * HDD write speed) big. Doing this maximizes the size of L2ARC which is good. But what happens if I add a striped disk which effectively doubles my HDD speed (and capacity)?

Summary of questions if using SSD for ZIL + L2ARC:

  1. If I set up SSD for ZIL + L2ARC, how hard is it to re-set it up with different partition sizes?
  2. If I use SSD for L2ARC, is its capacity included in total available pool capacity, or is the pool capacity limited by HDD alone?
  3. Is L2ARC retained after system reboot, or does it have to be re-populated?
  4. Does data have to be copied from ZIL to L2ARC even if both are on same physical SSD?
  5. If ZIL is on SSD and there is still plenty of room for more intents to be logged, does the ZIL still automatically get flushed to SSD? If so, when/under what circumstances?

Summary of questions if using SSD + HDD in a single zpool:

  1. ZFS obviously notices the difference in size between SSD and HDD partitions, but does ZFS automatically recognize the relative performance of SSD and HDD partitions? In particular,
  2. How are writes distributed across the SSD and HDD when both are relatively empty?
  3. Does ZFS try to do anything smart with data shuffling once the SSD part of the zpool fill up? In particular,
  4. If the SSD part of zpool is filled up, does ZFS ever anticipate that I will have more writes soon and tries to move data from SSD to HDD in the background?
  5. If the SSD part of zpool is filled up, and I start accessing a bunch of data off HDD, and not so much off SSD, does ZFS make any effort to swap the hot data to SSD?

Finally, the most important question:

  1. Is it a good idea to set up SSD + HDD in same pool, or is there a better way to optimize my pair of drives for both speed and capacity?

Best Answer

While Marco's answer explained all the details correctly, I just want to focus on your last question/summary:

Is it a good idea to set up SSD + HDD in same pool, or is there a better way to optimize my pair of drives for both speed and capacity?

ZFS is a file system designed for large arrays with many smaller disks. Although it is quite flexible, I think it is suboptimal for your current situation and goal, for the following reasons:

  • ZFS does no reshuffling of already written data. What you are looking for is called a hybrid drive, for example Apple's Fusion Drive allows to fuse multiple disks together and automatically selects the storage location for every block based on access history (moving data is done when there is no load on the system or on rewrite). With ZFS, you have none of that, neither automatically nor manually, your data stays were it was written initially (or is already marked for deletion).
  • With just a single disk, you give up on redundancy and self-healing. You still detect errors, but you do not use the full capabilities of the system.
  • Both disks in the same pool means even higher chance of data loss (this is RAID0 after all) or corruption, additionally your performance will be sub par because of the different drive sizes and drive speeds.
  • HDD+SLOG+L2ARC is a bit better, but you need a very good SSD (better two different like Marco said, but a NVMe SSD is a good and expensive compromise) and most of the space on it is wasted: 2 to 4 GB for the ZIL are enough, and a large L2ARC only helps if your RAM is full, but needs higher amounts of RAM itself. This leads to sort of catch-22 - if you want to use L2ARC, you need more RAM, but then you can just use the RAM itself, because it is enough. Remember, only blocks are stored, so you do need not as much as you would assume by looking at plain files.

Now, what are the alternatives?

  • You could split by having two pools. One for system, one for data. This way, you have no automatic rebalance and no redundancy, but a clean system which can be extended easily and which has no RAID0 problems.
  • Buy a second large HDD, make a mirror, use the SSD like you outlined: removes the problem of differently sized disks and disk speeds, gives you redundancy, keeps the SSD flexible.
  • Buy n SSDs and do RAIDZ1/2/3. Smaller SSDs are pretty cheap nowadays and do not suffer slow rebuild times, making RAIDZ1 interesting again.
  • Use another file system or volume manager with hybrid capabilities, ZFS on top if needed. This is not seen as optimal, but neither is working with two single disk vdevs in a pool... at least you get exactly what you want, and some nice things of ZFS (snapshots etc.) on top, but I wouldn't count on stellar performance.
Related Question