How to test SSDs or NVMe for badblocks’

bad-blockssmartssd

With the traditional spinning disks diagnostics is rather easy. If you suspect a drive to be faulty, you can check the SMART values, run a SMART extended test and a badblocks -wsv test. If all three tests show no error, the drive is probably/usually fine.

What should we do in case of SSDs or modern NVMe drives?

Obviously, SMART is still a good idea, but what if it complets without error? Is running badblocks -wsv on a flash-based memory device a good idea?

Are there other options?

Also, if using badblocks what options are suitable? Should one use the "erase block size" of the SSD?

This question is similar to Can I prove that an SSD is broken? But the answers there are from 2013. We have seen several generations of flash technologies since then. – Also, while they suggest badblocks, I am missing a discussion on weather this is a good idea at all. Ultimatley, some flash memories do not like it to be written to a 100%. Also, how do we tell the SSD afterwards which sectors are free (again)?

How to fix bad blocks on SSD is also not satisfying.

How safe is it to run CHKDSK on an SSD? discusses only the impact of chkdsk

I could not find other resources that deal with this problem.

Best Answer

In general, you shouldn't need to, beyond paying attention to what SMART is already telling you. The reason is that SSDs use wear leveling, so they have an advanced controller that already takes care of detecting and re-mapping bad blocks in the background, so from the OS's perspective, and the perspective of standard utilities like badblocks, any blocks that went bad are invisible because they've already been remapped. If badblocks somehow did find a block that was bad, it would be immediately remapped and thus would be "good" again the next time you read it.

To really get an indication of the health of your drive, what you need to know is how many bad blocks the controller has already remapped, and how much spare capacity remains to allow it to remap further. SMART data should give you this for SATA, or NVMe has equivalent log pages that contain the same information. In particular, the 'Available Spare' attribute will give you a percentage of how much of the drive's total remapping capability has been used up.

This page has some specific command line tools you can use for SATA or NVMe: https://www.percona.com/blog/2017/02/09/using-nvme-command-line-tools-to-check-nvme-flash-health/

Related Solutions

Linux – Using “badblocks” on modern disks

Question 1:

With regards to the -b option: this depends on your disk. Modern, large disks have 4KB blocks, in which case you should set -b 4096. You can get the block size from the operating system, and it's also usually obtainable by either reading the disk's information off of the label, or by googling the model number of the disk. If -b is set to something larger than your block size, the integrity of badblocks results can be compromised (i.e. you can get false-negatives: no bad blocks found when they may still exist). If -b is set to something smaller than the block size of your drive, the speed of the badblocks run can be compromised. I'm not sure, but there may be other problems with setting -b to something smaller than your block size, since it isn't verifying the integrity of an entire block, it might still be possible to get false-negatives if it's set too small.

The -c option corresponds to how many blocks should be checked at once. Batch reading/writing, basically. This option does not affect the integrity of your results, but it does affect the speed at which badblocks runs. badblocks will (optionally) write, then read, buffer, check, repeat for every N blocks as specified by -c. If -c is set too low, this will make your badblocks runs take much longer than ordinary, as queueing and processing a separate IO request incurs overhead, and the disk might also impose additional overhead per-request. If -c is set too high, badblocks might run out of memory. If this happens, badblocks will fail fairly quickly after it starts. Additional considerations here include parallel badblocks runs: if you're running badblocks against multiple partitions on the same disk (bad idea), or against multiple disks over the same IO channel, you'll probably want to tune -c to something sensibly high given the memory available to badblocks so that the parallel runs don't fight for IO bandwidth and can parallelize in a sane way.

Question 2:

Contrary to what other answers indicate, the -w write-mode test is not more or less reliable than the non-destructive read-write test, but it is twice as fast, at the cost of being destructive to all of your data. I'll explain why:

In non-destructive mode, badblocks does the following:

Read existing data, checksum it (read again if necessary), and store it in memory.
Write a predetermined pattern (overrideable with the -p option, though usually not necessary) to the block.
Read the block back, verifying that the read data is the same as the pattern.
Write the original data back to the disk.
- I'm not sure about this, but it also probably re-reads and verifies that the original data was written successfully and still checksums to the same thing.

In destructive (-w) mode, badblocks only does steps 2 and 3 above. This means that the number of read/write operations needed to verify data integrity is cut in half. If a block is bad, the data will be erroneous in either mode. Of course, if you care about the data that is stored on your drive, you should use non-destructive mode, as -w will obliterate all data and leave badblocks' patterns written to the disk instead.

Caveat: if a block is going bad, but isn't completely gone yet, some read/write verification pairs may work, and some may not. In this case, non-destructive mode may give you a more reliable indication of the "mushiness" of a block, since it does two sets of read/write verification (maybe--see the bullet under step 4). Even if non-destructive mode is more reliable in that way, it's only more reliable by coincidence. The correct way to check for blocks that aren't fully bad but can't sustain multiple read/write operations is to run badblocks multiple times over the same data, using the-p option.

Question 3:

If SMART is reallocating sectors, you should probably consider replacing the drive ASAP. Drives that lose a few sectors don't always keep losing them, but the cause is usually a heavily-used drive getting magnetically mushy, or failing heads/motors resulting in inaccurate or failed reads/writes. The final decision is up to you, of course: based on the value of the data on the drive and the reliability you need from the systems you run on it, you might decide to keep it up. I have some drives with known bad blocks that have been spinning with SMART warnings for years in my fileserver, but they're backed up on a schedule such that I could handle a total failure without much pain.

Linux – Trying to remove/diagnose single Current_Pending_Sector in S.M.A.R.T. data

A sector is marked pending when a read fails. The pending sector will be marked reallocated if a subsequent write fails. If the write succeeds, it is removed from current pending sectors and assumed to be ok. (The exact behavior could differ slightly and I'll go into that later, but this is a close enough approximation for now.)

When you run badblocks -w, each pattern is first written, then read. It's possible that the write to the flaky sector succeeds but the subsequent read fails, which again adds it to the pending sector list. I would try writing zeroes to the entire disk with dd if=/dev/zero of=/dev/sda, checking the SMART status, then reading the entire disk with dd if=/dev/sda of=/dev/null and checking the SMART status again.

Update:

Based on your earlier results with badblocks -w, I would have expected the pending sector to be cleared after writing the entire disk. But since that didn't happen, it's safe to say this disk is not behaving as expected.

Let's review the description of Current Pending Sector Count:

Count of "unstable" sectors (waiting to be remapped, because of unrecoverable read errors). If an unstable sector is subsequently read successfully, the sector is remapped and this value is decreased. Read errors on a sector will not remap the sector immediately (since the correct value cannot be read and so the value to remap is not known, and also it might become readable later); instead, the drive firmware remembers that the sector needs to be remapped, and will remap it the next time it's written.[29] However some drives will not immediately remap such sectors when written; instead the drive will first attempt to write to the problem sector and if the write operation is successful then the sector will be marked good (in this case, the "Reallocation Event Count" (0xC4) will not be increased). This is a serious shortcoming, for if such a drive contains marginal sectors that consistently fail only after some time has passed following a successful write operation, then the drive will never remap these problem sectors.

Now let's review the important points:

...the drive firmware remembers that the sector needs to be remapped, and will remap it the next time it's written.[29] However some drives will not immediately remap such sectors when written; instead the drive will first attempt to write to the problem sector and if the write operation is successful then the sector will be marked good.

In other words, the pending sector should have either been remapped immediately, or the drive should have attempted to write to the sector and one of two things should have happened:

The write failed, in which case the pending sector should have been remapped.
The write succeeded, in which case the pending sector should have been cleared ("marked good").

I hinted at this earlier, but Wikipedia's description of Current Pending Sector suggests that the current pending sector count should always be zero after a full disk write. Since that is not the case here, we can conclude that either (a) Wikipedia is wrong (or at least incorrect for your drive), or (b) the drive's firmware cannot properly handle this error state (which I would consider a firmware bug).

If an unstable sector is subsequently read successfully, the sector is remapped and this value is decreased.

Since the current pending sector count is still unchanged after reading the entire drive, we can assert that either (a) the sector could not be successfully read or (b) the sector was successfully read and marked good, but there was an error reading a different sector. But since the reallocated sector count is still 0 after the read, we can exclude (b) as a possibility and can conclude that the pending sector was still unreadable.

At this point, it would be helpful to know if the drive has logged any new SMART errors. My next suggestion was going to be to check whether Seagate has a firmware update for your drive, but it looks like they don't.

Although I would recommend against continuing to use this drive, it sounds like you might be willing to accept the risks involved (namely, that it could continue to act erratically and/or could further degrade or fail catastrophically). In that case, you can try to install Linux, boot from a rescue CD, then (with the filesystems unmounted) use e2fsck -l filename to manually mark the appropriate block as bad. (Just make sure you maintain good backups!)

e2fsck -l filename

Add the block numbers listed in the file specified by filename to the list of bad blocks. The format of this file is the same as the one generated by the badblocks(8) program. Note that the block numbers are based on the blocksize of the filesystem. Hence, badblocks(8) must be given the blocksize of the filesystem in order to obtain correct results. As a result, it is much simpler and safer to use the -c option to e2fsck, since it will assure that the correct parameters are passed to the badblocks program.

(Note that e2fsck -c is preferred to e2fsck -l filename, and you might even want to try it, but based on your results thus far, I highly doubt e2fsck -c will find any bad blocks.)

Of course, you'll have to do some arithmetic to convert the LBA of the faulty sector (as provided by SMART) into a filesystem block number. The Bad Blocks HowTo provides a handy formula:

  b = (int)((L-S)*512/B)
where:
b = File System block number
B = File system block size in bytes
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk -lu
and (int) denotes the integer part.

The HowTo also contains a complete example using this formula. After the OS is installed, you can confirm whether a file is occupying the flaky sector using debugfs (see the HowTo for detailed instructions).

Another option: partition around the suspected bad block When you install your OS, you could also try to partition around the error. If I did my arithmetic right, the error is at around 81.589 MB, so can either make /boot a little small and start your next partition after sector 167095, or skip the first 82 MB or so completely.

ABRT 235018779 Unfortunately, as for the ABRT error at sector 235018779, we can only speculate, but the ATA8-ACS spec gives us some clues.

From Working Draft AT Attachment 8 - ATA/ATAPI Command Set (ATA8-ACS):

6.2.1 Abort (ABRT) Error bit 2. Abort shall be set to one if the command is not supported. Abort may be set to one if the device is not able to complete the action requested by the command. Abort shall also be set to one if an address outside of the range of user-accessible addresses is requested if IDNF is not set to one.

Looking at the commands leading up to the ABRT (several READ SECTOR(S) followed by recalibration and reinitialization)...

Abort shall be set to one if the command is not supported. - This seems unlikely.

Abort may be set to one if the device is not able to complete the action requested by the command. - Maybe the P-list of reallocated sectors shifts the user-accessible addresses far enough that a user-accessible address translated to sector 235018779, and the read operation was not able to complete (for what reason, we don't know...but there wasn't a CRC error, so I don't think we can conclude that sector 235018779 is bad).

Abort shall also be set to one if an address outside of the range of user-accessible addresses is requested if IDNF is not set to one. - To me this seems most likely, and I would probably interpret it as the result of a software bug (either your OS or some program you were running). In that case, it is not a sign of impending doom for the hard drive.

Just in case you're not tired of running diagnostics yet...

You could try smartctl -t long /dev/sda again to see if it produces any more errors in the SMART log, or you could leave this one as an unsolved X-file ;) and check the SMART log periodically to see whether it happens again. In any case, if you continue to use the drive without getting it to either reallocate or clear the pending sector, you're already taking a risk.

Use a checksumming filesystem

For a little more safety, you may want to consider using a checksumming filesystem such as ZFS or btrfs to help protect against low-level data corruption. And don't forget to perform frequent backups if you have anything that cannot be easily reproduced.

Best Answer

Related Solutions

Linux – Using “badblocks” on modern disks

Linux – Trying to remove/diagnose single Current_Pending_Sector in S.M.A.R.T. data

Related Question