Is this setup recommendable or even possible?
It is technically possible. I'm not sure what the limits are to growing a NTFS file system, but I suspect you can grow the NTFS file system from the beginning of the disk inwards, and I know that you can grow a ZFS pool by adding arbitrary partitions so you could grow that from the end of the disk inwards. Doing so will cause the two to meet somewhere in the middle.
Recommended, though? I would say absolutely not!
does this end in seek hell, or can the ZFS volume consist of multiple partitions on one disk without degrading performance (without spare copies)?
ZFS is seek-heavy to begin with, because of its copy-on-write design. (This is exacerbated if you use snapshots. Snapshots happen to be a ZFS feature that lots of people really like, even if they don't really use any of ZFS' other features, but they come at a significant cost in terms of risk of data fragmentation, particularly if you make many small writes.)
When you add additional storage to an existing pool, which you would with your proposed scheme, the data already stored is not automatically rebalanced. Instead, ZFS writes new data blocks to vdevs that have the most free space, which causes the data to be distributed across all vdevs over time as it is written. In your case, this means that data will be distributed roughly as you expect: newly written data, including changes to existing files, is more likely to be written to the newly added vdev than to the previously existing vdev(s). Since all vdevs are on the same physical disk, there is no real reliability gain from this; if the single disk dies, then your pool dies, and takes all your data with it.
ZFS also tries to distribute the data evenly within the vdev, on the assumption that this reduces the risk of localized physical damage or logical errors affecting all copies equally. This is one driving reason behind its Merkle tree on-disk design, and is a likely reason why it tries to place copies
as far apart as possible, preferably on unrelated vdevs.
There is, as far as I know, currently no native support in ZFS to rebalance the data between vdevs after adding additional storage. Btrfs has the btrfs rebalance
command, but ZFS has nothing similar. You can get somewhat close by copying the data (using zfs send ... | zfs recv ...
) but that's only a side effect.
Bottom line, I doubt your proposed setup would be significantly worse in terms of disk seeks than a similarly set up ZFS on a single partition.
I then created a huge ZFS filesystem on it. If I set copies=2, does that end in seek-hell as well, or is there some clever ZFS mechanism that will store copies of all files on the same disk using a buffer [assumption: seek hell as well + only one copy because you would need several devices for several copies].
First, please keep in mind the difference between ZFS pools and ZFS file systems. You created a pool, which by default contains a single file system with the same name as the pool. The pool dictates the properties of the physical storage, such as vdev configuration and minimum block size (known as ashift in ZFS), and is administered using the zpool
utility. The file system dictates the properties of the logical storage, such as compression, quotas, mount points, and checksumming, and is administered using the zfs
utility. This is in contrast to e.g. Btrfs, which lump the two together.
Second, let me briefly introduce you to how the copies
ZFS file system property works. copies
specifies the redundancy above and beyond the physical redundancy and is, in effect, similar to making several user-visible copies of a single file (but doesn't break the user's mental model if deduplication is in use on the file system). While it applies to all vdev redundancy types, this is easiest to illustrate with mirrors: A two-way mirror vdev will, by the simple property of being a mirror, store two identical copies of your data. If you have also set copies=2
, then each of the two disks in the mirror pair will also hold two copies of your data, for a total of four copies of the bits being stored and a grand total of about 25% usable storage (when compared to the amount of raw storage available). The simple explanation breaks down somewhat with raidzN vdevs, but the result is the same: multiple copies of the same bits being stored on disk, such that in case one gets bad, another can be used.
By default, a single copy of user data is stored, and two copies of file system metadata is stored. By increasing copies
, you adjust this behavior such that copies
copies of user data (within that file system) is stored, and copies
plus one copies of system metadata (within that file system) is stored. For best effect, if you want to set copies to a value greater than one, you should do so when you create the pool using zpool create -O copies=N
, to ensure that additional copies of all root file system metadata is stored.
Under normal read operations, extra copies only consume storage space. When a read error occurs, if there are redundant, valid copies of the data, those redundant copies can be used as an alternative in order to satisfy the read request and transparently rewrite the broken copy. Read errors can be either outright I/O errors, or if checksumming is turned on (which it is by default, and which you really should leave that way unless you have some really unusual workload), data coming back from the disk as something other than what was intended to be written originally (a checksum mismatch).
However, during writes, all copies must be updated to ensure that they are kept in sync. Because ZFS aims to place copies far away from each other, this introduces additional seeking. Also don't forget about its Merkle tree design, with metadata blocks placed some physical distance away from the data blocks (to guard against for example a single write failure corrupting both the checksum and the data). I believe that ZFS aims to place copies
at least 1/8 of the vdev away from each other, and the metadata block containing the checksum for a data block is always placed some distance away from the data block.
Consequently, setting copies
greater than 1 does not significantly help or hurt performance while reading, but reduces performance while writing in relation to the number of copies requested and the IOPS (I/O operations per second) performance of the underlying storage.
This question is very hard, especially in view of the fact that SSD technology
is in constant evolution, and especially since modern operating systems are
constantly improving their handling of SSD.
In addition, I'm not sure that your problem is with Wear leveling.
It should rather be with SSD optimizations designed to avoid block erases.
Let us first get our terms right :
- An SSD block or Erase block is the unit that the SSD can erase in one atomic operation, which can usually go up to 4MB bytes
(but 128KB or 256KB are more common).
An SSD cannot write to a block without erasing it first.
- An SSD page is the smallest atomic unit that the SSD software can track.
A block usually contains multiple pages, usually up to 4KB in size.
The SSD keeps a mapping per page of where the OS thinks it is located
on the disk (the SSD writes pages wherever it prefers although the OS will
think in terms of a sequential disk).
- A sector is the smallest element that the operating system thinks a hard disk
can write in one operation. The OS will also think in terms of disk cylinders
and tracks, even if they do not apply to SSD.
The OS will usually inform the SSD when a sector becomes free
(TRIM).
Smart SSD firmware will usually announce to the OS its page-size as the sector-size where possible.
It is clear that the SSD firmware would prefer always writing to empty blocks,
as they are already erased. Otherwise, to add a page to a block that contains
data will require the sequence of read-block/store-page/erase-block/write-block.
Too liberal application of the above will cause pages to be dispersed all over
the SSD and most blocks to become partially empty, so the SSD may soon run out
of empty blocks. To avoid that, the SSD will continuously do
Garbage collection in the background, consolidating partially-written
blocks and ensuring enough empty blocks are available.
This operation may look like this:
[
Garbage collection introduces another factor -
Write amplification
- meaning that one OS write to the SSD may need more than one physical write
on the SSD.
As an SSD block can only be erased and written a certain number of times before
it dies, Wear leveling
is designed to distribute block writes uniformly
across the SSD so no block is written much more than others.
The question of partition alignment
From the above, it looks like the mechanism that allows the SSD to map pages
to any physical location, keeping wherever the OS thinks they are stored,
voids the need for partition alignment. Since the page is not written where
the OS thinks it is written, there is no more any importance as to where the OS
thinks it writes the data.
However, this ignores the fact that the OS itself attempts to optimize
disk accesses. For classical hard disk it will attempt to minimize head
movements by allocating data accordingly on different tracks.
Clever SSD firmware should manipulate the fictional cylinder and tracks
information that it reports to the OS so that track-size will equal
block-size, and page-size will equal sector-size.
When the view the OS has of the SSD is in somewhat more in line with reality,
the optimizations done by the OS may avoid the need for the SSD to map pages
and avoid garbage collection, which will reduce Write amplification and
increase the lifetime of the SSD.
It should be noted that too much fragmentation of SSD (meaning too much
mapping of pages) increases the amount of work done by the SSD.
The 2009 article
Long-term performance analysis of Intel Mainstream SSDs
indicated that if the drive is abused for too long with a mixture of small and large writes, it can get into a state where the performance degredation is permanent, and that with Wear leveling this condition may extend to more
of the drive.
This condition is the reason while many SSD owners see performance degrade
over time.
My final advice is to align partitions to respect erase-blocks layout.
The OS will assume that a partition is well-aligned as regarding the disk,
and the decisions taken by it on the placement of files might be more
intelligently done. As always, individual idiosyncrasies of OS driver
versus SSD firmware may invalidate such concerns, but better to play it safe.
Best Answer
Short answer: you will be fine either way, but I'd take the large SSD.
Longer answer:
Speed
The bigger variants of the same model usually have better read and (especially) write speeds. Now, depending on the performance/number/kind of your SATA-controller on your mainboard, it might be faster to have 2 SSDs working in parallel. Or you create a hardware-RAID0 out of the 2 SSDs which might increase speed a little.[1]
Whatever you do, most current SSDs will be fast enough[2]. While differences can be measured in benchmarks, but you should not notice a much difference in day-to-day use.
-> might be no difference in real life for desktop PCs
Durability
Modern SSDs have spare blocks that can be used in their wear-leveling algorithms to improve the durability of the drive. The amount of spare blocks is often proportional to the size of the SSD.
What that means to you:
Since you want to use the SSD mostly for gaming and apps, most modern SSDs will be durable enough for you. And you should have a backup of your important data anyway, so...
-> might be no difference in real life for desktop PCs
Future value
Say, in 2 years you want to buy a new SSD (since even the big one is now too small to hold all your games). If the old one was bigger, then it has better value for you, as you could use it to upgrade an old notebook that might have a smaller SSD (many notebooks can have only 1 SSD, so only one of the 2 small ones could not be used otherwise).
-> In this category, the big SSD wins outright.
[1] However, if one of the SSDs breaks for some reason, all data will be lost. Statistically, this is more likely to happen for the RAID0 than for a single drive. Also, the RAID controller might break.
[2] Compared to a regular HDD