All these "poke the sector" answers are, quite frankly, insane. They risk (possibly hidden) filesystem corruption. If the data were already gone, because that disk stored the only copy, it'd be reasonable. But there is a perfectly good copy on the mirror.
You just need to have mdraid scrub the mirror. It'll notice the bad sector, and rewrite it automatically.
# echo 'check' > /sys/block/mdX/md/sync_action # use 'repair' instead for older kernels
You need to put the right device in there (e.g., md0 instead of mdX). This will take a while, as it does the entire array by default. On a new enough kernel, you can write sector numbers to sync_min/sync_max first, to limit it to only a portion of the array.
This is a safe operation. You can do it on all of your mdraid devices. In fact, you should do it on all your mdraid devices, regularly. Your distro likely ships with a cronjob to handle this, maybe you need to do something to enable it?
Script for all RAID devices on the system
A while back, I wrote this script to "repair" all RAID devices on the system. This was written for older kernel versions where only 'repair' would fix the bad sector; now just doing check is sufficient (repair still works fine on newer kernels, but it also re-copies/rebuilds parity, which isn't always what you want, especially on flash drives)
#!/bin/bash
save="$(tput sc)";
clear="$(tput rc)$(tput el)";
for sync in /sys/block/md*/md/sync_action; do
md="$(echo "$sync" | cut -d/ -f4)"
cmpl="/sys/block/$md/md/sync_completed"
# check current state and get it repairing.
read current < "$sync"
case "$current" in
idle)
echo 'repair' > "$sync"
true
;;
repair)
echo "WARNING: $md already repairing"
;;
check)
echo "WARNING: $md checking, aborting check and starting repair"
echo 'idle' > "$sync"
echo 'repair' > "$sync"
;;
*)
echo "ERROR: $md in unknown state $current. ABORT."
exit 1
;;
esac
echo -n "Repair $md...$save" >&2
read current < "$sync"
while [ "$current" != "idle" ]; do
read stat < "$cmpl"
echo -n "$clear $stat" >&2
sleep 1
read current < "$sync"
done
echo "$clear done." >&2;
done
for dev in /dev/sd?; do
echo "Starting offline data collection for $dev."
smartctl -t offline "$dev"
done
If you want to do check
instead of repair
, then this (untested) first block should work:
case "$current" in
idle)
echo 'check' > "$sync"
true
;;
repair|check)
echo "NOTE: $md $current already in progress."
;;
*)
echo "ERROR: $md in unknown state $current. ABORT."
exit 1
;;
esac
Best Answer
e2fsck -c
runsbadblocks
on the underlying hard disk. You can use thebadblocks
command directly on a LVM physical volume (assuming that the PV is in fact a hard disk, and not some other kind of virtual device like an MD software RAID device), just as you would use that command on a hard disk that contains an ext file system.That won't add any kind of bad block information to the file system, but I don't really think that that's a useful feature of the file system; the hard disk is supposed to handle bad blocks.
Even better than
badblocks
is running a SMART selftest on the disk (replace/dev/sdX
with the device name of your hard disk):The test ifself will take a few hours (it will tell you exactly how long). When it's done, you can query the result with
smartctl -a
, look for the self-test log. If it says "Completed successfully", your hard disk is fine.As I said, the hard disk itself will ensure that it doesn't use damaged blocks and it will also relocate data from those blocks; that's not something that the file system or the LV has to do. On the other hand, when your hard disk has more than just a few bad blocks, you don't want something that relocates them, but you want to replace the whole hard disk because it is failing.