It sounds like the hard disk itself is having problems. ("short read," etc.) If so, dmesg | tail
will probably show some I/O errors.
Another way to check this is to run badblocks -n
on the problem partition. Or better, on the entire disk. Whatever you test, it needs to be unmounted. This will take hours on a large modern disk. If there's anything on the partition(s) that do mount that you can't live without, copy it off onto removable media or a network volume first.
The suggestion to mirror the disk is also good. It's kind of a "lite" version of the badblocks -n
check, because by forcing the disk to read in every sector, it can cause the disk to relocate problem blocks, as badblocks -n
will. badblocks -n
is more effective because dodgy sectors can be barely-readable, and only be shown to the disk as bad enough to move by attempting to write to them. Still, if the disk has enough life left in it to survive a rescue, the extra read pass won't be enough to finish it off.
I don't hold much hope that running fsck
on the disk image will recover everything. You'll almost certainly lose sectors in this process, which means some files will be unreadable or corrupted beyond use. A JPEG will partially decode with corrupted data, for example, but a JPEG with the bottom ⅔ cropped off might not be useful to you.
Is my data toasted?
Possibly, possibly not. The badblocks -n
pass can sometimes fix the problem. If it does, you still need to replace the HDD, since a disk can only get into such a bad state by being nearly dead to start.
Did I do the wrong thing already?
Other than forgetting the meaning of the word "rigorous," no. :)
Sadly, no.
btrfs doesn't track bad blocks and btrfs scrub
doesn't prevent the next file from hitting the same bad block(s).
This btrfs mailing list post suggests to use ext4 with mkfs.ext4 -c
(this "builds a bad blocks list and then
won't use those sectors").
The suggestion to use btrfs over mdadm 3.1+ with RAID0 will not work.
It seems that LVM doesn't support badblock reallocation.
A work-around is to build a device excluding blocks known to be bad: btrfs over dmsetup.
The btrfs Project Ideas wiki says:
Not claimed — no patches yet — Not in kernel yet
Currently btrfs doesn't keep track of bad blocks, disk blocks that are very likely to lose data written to them. Btrfs should accept a list in badblocks' output format, store it in a new btree (or maybe in the current extent tree, with a new flag), relocate whatever data the blocks contain, and reserve these blocks so they can't be used for future allocations. Additionally, scrub could be taught to test for bad blocks when a checksum error is found. This would make scrub much more useful; checksum errors are generally caused by the disk, but while scrub detects afflicted files, which in a backup scenario gives the opportunity to recreate them, the next file to reuse the bad blocks will just start getting errors instead. These two items would match an ext4 feature (used through e2fsck).
Please comment if the status changes and I will update this answer.
Best Answer
Try