Partitioning – Is Partition Alignment to SSD Erase Block Size Pointless?

partitioningssd

Many people seem to have the idea (1, 2, 3, 4, 5) that aligning the start of your SSD partitions at a multiple of the SSD erase block size is somehow benefitial. I do not see the benefit; consider the following partitioning (please, suspend your disbelief about the 16K erase blocks; they are likely to be much larger in practice and so are the partitions):

Partitions:      [    1   ]              [        2        ]
Logical blocks:  [ 4K ][ 4K ][ 4K ][ 4K ][ 4K ][ 4K ][ 4K ][ 4K ]
Physical blocks: [ 4K ][ 4K ][ 4K ][ 4K ][ 4K ][ 4K ][ 4K ][ 4K ]
Erase blocks:    [          16K         ][          16K         ]

Now if logical block K corresponded to physical block K for any K (e.g. if there were no wear-levelling done by the SSD controller), then there might be some theoretical merit to this. Suppose for example that partition 2 in the above figure starts one logical / physical block earlier. Then any write at the beginning of partition 2 will cause the erasure of the first erase block as will any write to partition 1, which will cause additional wear to that particular erase block.

With wear-levelling, however, there is no set correspondence between logical and physical blocks (e.g. the logical block K can correspond to an arbitrary physical block L), so the erase-block alignment should be completely meaningless. Alignment to block size should be sufficient, so that pages (for swapping) and filesystem blocks (for data) written out to the partition do not occupy more blocks on the SSD than necessary.

Related questions:

Best Answer

This question is very hard, especially in view of the fact that SSD technology is in constant evolution, and especially since modern operating systems are constantly improving their handling of SSD.

In addition, I'm not sure that your problem is with Wear leveling. It should rather be with SSD optimizations designed to avoid block erases.

Let us first get our terms right :

  • An SSD block or Erase block is the unit that the SSD can erase in one atomic operation, which can usually go up to 4MB bytes (but 128KB or 256KB are more common). An SSD cannot write to a block without erasing it first.
  • An SSD page is the smallest atomic unit that the SSD software can track. A block usually contains multiple pages, usually up to 4KB in size. The SSD keeps a mapping per page of where the OS thinks it is located on the disk (the SSD writes pages wherever it prefers although the OS will think in terms of a sequential disk).
  • A sector is the smallest element that the operating system thinks a hard disk can write in one operation. The OS will also think in terms of disk cylinders and tracks, even if they do not apply to SSD. The OS will usually inform the SSD when a sector becomes free (TRIM). Smart SSD firmware will usually announce to the OS its page-size as the sector-size where possible.

It is clear that the SSD firmware would prefer always writing to empty blocks, as they are already erased. Otherwise, to add a page to a block that contains data will require the sequence of read-block/store-page/erase-block/write-block.

Too liberal application of the above will cause pages to be dispersed all over the SSD and most blocks to become partially empty, so the SSD may soon run out of empty blocks. To avoid that, the SSD will continuously do Garbage collection in the background, consolidating partially-written blocks and ensuring enough empty blocks are available. This operation may look like this:

[image1][1]

Garbage collection introduces another factor - Write amplification - meaning that one OS write to the SSD may need more than one physical write on the SSD.

As an SSD block can only be erased and written a certain number of times before it dies, Wear leveling is designed to distribute block writes uniformly across the SSD so no block is written much more than others.

The question of partition alignment

From the above, it looks like the mechanism that allows the SSD to map pages to any physical location, keeping wherever the OS thinks they are stored, voids the need for partition alignment. Since the page is not written where the OS thinks it is written, there is no more any importance as to where the OS thinks it writes the data.

However, this ignores the fact that the OS itself attempts to optimize disk accesses. For classical hard disk it will attempt to minimize head movements by allocating data accordingly on different tracks. Clever SSD firmware should manipulate the fictional cylinder and tracks information that it reports to the OS so that track-size will equal block-size, and page-size will equal sector-size.

When the view the OS has of the SSD is in somewhat more in line with reality, the optimizations done by the OS may avoid the need for the SSD to map pages and avoid garbage collection, which will reduce Write amplification and increase the lifetime of the SSD.

It should be noted that too much fragmentation of SSD (meaning too much mapping of pages) increases the amount of work done by the SSD. The 2009 article Long-term performance analysis of Intel Mainstream SSDs indicated that if the drive is abused for too long with a mixture of small and large writes, it can get into a state where the performance degredation is permanent, and that with Wear leveling this condition may extend to more of the drive. This condition is the reason while many SSD owners see performance degrade over time.

My final advice is to align partitions to respect erase-blocks layout. The OS will assume that a partition is well-aligned as regarding the disk, and the decisions taken by it on the placement of files might be more intelligently done. As always, individual idiosyncrasies of OS driver versus SSD firmware may invalidate such concerns, but better to play it safe.