How to test SSDs or NVMe for badblocks’

bad-blockssmartssd

With the traditional spinning disks diagnostics is rather easy. If you suspect a drive to be faulty, you can check the SMART values, run a SMART extended test and a badblocks -wsv test. If all three tests show no error, the drive is probably/usually fine.

What should we do in case of SSDs or modern NVMe drives?

Obviously, SMART is still a good idea, but what if it complets without error? Is running badblocks -wsv on a flash-based memory device a good idea?

Are there other options?

Also, if using badblocks what options are suitable? Should one use the "erase block size" of the SSD?


This question is similar to Can I prove that an SSD is broken? But the answers there are from 2013. We have seen several generations of flash technologies since then. – Also, while they suggest badblocks, I am missing a discussion on weather this is a good idea at all. Ultimatley, some flash memories do not like it to be written to a 100%. Also, how do we tell the SSD afterwards which sectors are free (again)?

How to fix bad blocks on SSD is also not satisfying.

How safe is it to run CHKDSK on an SSD? discusses only the impact of chkdsk

I could not find other resources that deal with this problem.

Best Answer

In general, you shouldn't need to, beyond paying attention to what SMART is already telling you. The reason is that SSDs use wear leveling, so they have an advanced controller that already takes care of detecting and re-mapping bad blocks in the background, so from the OS's perspective, and the perspective of standard utilities like badblocks, any blocks that went bad are invisible because they've already been remapped. If badblocks somehow did find a block that was bad, it would be immediately remapped and thus would be "good" again the next time you read it.

To really get an indication of the health of your drive, what you need to know is how many bad blocks the controller has already remapped, and how much spare capacity remains to allow it to remap further. SMART data should give you this for SATA, or NVMe has equivalent log pages that contain the same information. In particular, the 'Available Spare' attribute will give you a percentage of how much of the drive's total remapping capability has been used up.

This page has some specific command line tools you can use for SATA or NVMe: https://www.percona.com/blog/2017/02/09/using-nvme-command-line-tools-to-check-nvme-flash-health/

Related Question