Can LVM mark / avoid bad blocks

btrfslvmmdadm

Since as btrfs doesn't track bad blocks, as a work-around this btrfs mailing list post suggested using an underlying mdadm RAID0 configuration for badblocks support.

Could LVM be used instead of mdadm for this purpose?

Best Answer

In general, as has been mentioned in a comment here and in the mailing list thread you linked to, modern hard drives which are so far gone they’ve got unreplaceable bad blocks should just be discarded. (You’ve explained why you’re interested in this, but it’s worth noting for other readers.)

I don’t think there’s anything in LVM to avoid bad blocks as such; typically you’d address that below LVM, at the device layer. One way of dealing with the problem is to use device mapper: create a table giving the sector mapping required to skip all bad blocks, and build a device using that. Such a table would look something like

0 98 linear /dev/sda 0
98 98 linear /dev/sda 99

etc. (this creates a 196-sector device, using /dev/sda but skipping sector 98). You give this to dmsetup:

dmsetup create nobbsda --table mytable

and then create a PV on the resulting /dev/nobbsda device (instead of /dev/sda).

Using this method, with a little forward-planning you can even handle failing sectors in the future, in the same way as a drive’s firmware: leave some sectors at the end of the drive free (or even dotted around the drive, if you want to spread the risk), and then use them to fill holes left by failing sectors. Using the above example, if we consider sectors starting from say 200 to be spare sectors, and sector 57 becomes bad:

0 57 linear /dev/sda 0
57 1 linear /dev/sda 200
58 40 linear /dev/sda 58
98 98 linear /dev/sda 99

Creating a device-mapper table using a list of bad sectors as given by badblocks is left as an exercise for the reader.

Another solution that would work with an existing LVM setup would be use pvmove’s ability to move physical extents in order to move LVs out of bad areas. But that wouldn’t prevent those areas from being re-used whenever a new LV is created or an existing LV resized or moved.

Related Solutions

Mdadm with overlaying lvm – remove a hard drive

With Software RAID, you don't have to use whole disks.

If you have 3x2TB and 3x1TB, and planning to replace the 1TB with 2TB in the future, you could use 1TB members. So that's RAID5 (or if you prefer RAID6) over 6x1TB, and RAID5 over 3x1TB. So the 2TB will be shared by both RAIDs.

When you kick out an 1TB and add a 2TB instead, then one RAID will see a replacement, and the other will have the remaining 1TB added as new member.

LVM Bad Blocks – How to Check for Bad Blocks on an LVM Physical Volume

When you're using ext4, you can check for badblocks with the command e2fsck -c /dev/sda1 or whatever. This will "blacklist" the blocks by adding them to the bad block inode.

e2fsck -c runs badblocks on the underlying hard disk. You can use the badblocks command directly on a LVM physical volume (assuming that the PV is in fact a hard disk, and not some other kind of virtual device like an MD software RAID device), just as you would use that command on a hard disk that contains an ext file system.

That won't add any kind of bad block information to the file system, but I don't really think that that's a useful feature of the file system; the hard disk is supposed to handle bad blocks.

Even better than badblocks is running a SMART selftest on the disk (replace /dev/sdX with the device name of your hard disk):

smartctl -t long /dev/sdX
smartctl -a /dev/sdX | less

The test ifself will take a few hours (it will tell you exactly how long). When it's done, you can query the result with smartctl -a, look for the self-test log. If it says "Completed successfully", your hard disk is fine.

In other words, how can I check for bad blocks to not use in LVM?

As I said, the hard disk itself will ensure that it doesn't use damaged blocks and it will also relocate data from those blocks; that's not something that the file system or the LV has to do. On the other hand, when your hard disk has more than just a few bad blocks, you don't want something that relocates them, but you want to replace the whole hard disk because it is failing.

Best Answer

Related Solutions

Mdadm with overlaying lvm – remove a hard drive

LVM Bad Blocks – How to Check for Bad Blocks on an LVM Physical Volume

Related Question