Can ext4 filesystem withstand a bad sector (unreadable)

block-devicedata-recoveryext4

If a hdd bad sector happens in the metadata for /home/user/me on an ext4 fs would that mean data loss for all subdirectories?

Background:
I know that many users are satisfied with the ext4 filesystem, and are even reluctant to change to more "recently" developed alternatives (e.g. BTRFS), claiming increased risk of data loss. Indeed when regarding the time the code for ext4 is around today some results to find bugs speak.

With this introduction, my question is:

What is the resistence that ext4 filesystem has against a bad sector from a block device. A bad sector could swallow 4K bytes, which I imagine to "wreak havoc" if those 4K happen to swallow some directory information high up in the direcotry structure (i.e. /home/user/me directory).

I am aware that superblocks (being even more basic source of information are kept in a redundant form in ext4, so I imaging a bad block would be repairable there, I am though unsure if it would be automatically dedected)

So my question: Can ext4 resists loosing Bad block in its meta data?

I am aware that a bad block in the data/files content will always mean loosing those 512/4K sector (however I am using parchive as a remedy there).

Best Answer

After some investigation into the matter "if ext4 can handle a read error from a block devices" my preliminary conclusion is: only partially redundance exists in ext4

Here are some findings of my "look into" ext4's "safety featurs" (based on Ext4 wiki and "Inode Structure in EXT4 filesystem" )

with few exception cases such as inline Data or inline Data ext4 stores file contents "data blocks" and filesystem "meta-data blocks" separately. To the best of my understanding ext4 takes only provisions for some repair/redundancy with regards to latter.
repair/reduncany of metadata relies on a) a 2010 newly introduced/added checksum feature and b) of having additional copies of crucial meta-data
such "crucial meta-data" (in the sense of being at different parts of the block device) is.
1. the ext4 superblock
2. the block group descriptors information.
checksums protect the superblock, multiple-mount-protection, extented attributes, Directory Entries, HTREE nodes, Extents, Inodes and Group descriptors. While extents (a new ext4 feature is hence protected that supercedes in part the older Indirect Block Addressing (IBA) ), the older IBA blocks are not, as is stated:

Notice that there is neither a magic number nor a checksum to provide any level of confidence that the [IBA] block isn't full of garbage.

What ext4 can hence recover from on unreadable disc sector (512/4K)

Loss of Superblock and Block Group Descriptors, which have redundant copies stored in either all/or some specific Block Groups on disk.
Loss of a directory entry, becuase of an unreadable sector does not imply loss of access and moreover content of the files stored in the directory (only their names are lost). The files (including subdirectories) remain accessible via their filespec <inode-number> in debuge2fs.
Loss of parts of the inode table. Each entry in the inode table (the table being split up and the parts being writen into the Block groups that form the ext4 disk layout) occupies (padded) 256 bytes. Hence a unreadable sector should mean the loss of 2 to 16 files only. Additionally With the checksum features any corruption within the inode table should be if not necessarily correctable, however not go unnotices.

What are bad sector induced troubles `ext4` seems not to protect against

not crucial metadata, including inodes, directory entries, extents, IBA is not protected.
inodes: As mentioned the inode taking 256byte and being the core handle to the blocks that make up the file data, means loosing acccess to 2 up to 16 files (mostly irrespective of their file size).
directory entries: if lost due to a bad sector, imply that the file-paths to all files inside will have the filename portion lost. The size of the bad sector 512 or 4K bytes on the one hand and the filenames and space taken by the directory hashing feature impact the extent of the loss. Also it is my understanding that directory hashing essentially provides a redundancy (but I cannot assure this).
extents: lost parts of information in extent trees, alike the loss of the inode itself will compromise the acccess to the data blocks making up the file's content and would hence mean roughly the loss of 1 file.
IBA blocks: (see extents) + as mentioned earlier will additionally suffer increased vulnerability also to partial curruption (which however was not the main focus of the question).

additionally methods used

in order to test and prove some of the enumerated challenges with the ext4 disk layout (in regards to its resistance to bad sectors) the following tools are handy

debugfs <blockdev> which allows accessing files via filespec (either filepath, or in case of problems via inode number in < >)
truncate, dd,losetup,mount and mkfs.ext4 to create ext4 filesystems to play with.
dumpe2fs,tune2fs which provide information
dm-setup to assemble a virtual block device simulating a read error like this:
```
$> dmsetup create badsectordevice << EOF
0 2902 linear /dev/loop1 0
2902 2 error
2904 17576 linear /dev/loop1 2904
EOF
```
where as the exmample shows the block device sector is 512 and ext4 block size is 1024 hence and hence the LBA sector 2902,2903 are unreadable.

Related Solutions

Linux – Find which files are affected by bad blocks on ext4 filesystem

With a combination of dumpe2fs and debugfs, which are included in the e2fsprogs package along fsck.ext*.
You must use the output of a command as the argument of the next one.
These tools auto-detect the filesystem block size, so it is consistent and safer than direct badblocks invocation.

Prints the registered bad blocks of the filesystem:

# dumpe2fs -b DEVNAME

Prints the inodes which use the given block list:

# debugfs -R "icheck BLOCK ..." DEVNAME

Prints the pathnames to the given inode list:

# debugfs -R "ncheck INODE ..." DEVNAME

debugfs has also an interactive shell and the -f cmd_file option, but they are not much powerful or useful for this case.
The -R option allows more automated scripts like this:

#!/bin/sh
# Finds files affected by bad blocks on ext* filesystems.
# Valid only for ext* filesystems with bad blocks registered with
# fsck -c [-c] [-k] or -l|-L options.
# Can be extremely slow on damaged storage (not just a corrupt filesystem).

DEVNAME="$1"
[ -b "$DEVNAME" ] || exit 1

BADBLOCKS="$(dumpe2fs -b "$DEVNAME" | tr '\n' ' ')"
[ -n "$BADBLOCKS" ] || exit 0

INODES="$(debugfs -R "icheck $BADBLOCKS" "$DEVNAME" | awk -F'\t' '
    NR > 1 { bad_inodes[$2]++; }
    END {
        for (inode in bad_inodes) {
            if (inode == "<block not found>") {
                printf("%d unallocated bad blocks\n", bad_inodes[inode]) > "/dev/stderr";
                continue;
            }
            printf inode OFS;
        }
    }
')"
[ -n "$INODES" ] || exit 0

debugfs -R "ncheck -c $INODES" "$DEVNAME"

How to find the last sector used by an ext4 filesystem

You are right. There shouldn't be any problem.

To avoid some calculations you could use the bs option and use the partition name of the device rather than starting at an offset.

dd count=48934 bs=4096 if=/dev/sdxN  of=...

To be 100% sure about the size you could test it before. "Simulate" a smaller partition:

umount /dev/XYZ
losetup --offset N-BYTES --sizelimit $(( 48934 * 4096 )) /dev/loop1 /dev/XYZ

mount or fsck of /dev/loop1 should tell you if you made it too small. resize2fs would tell if the partition is still too large but there is no dry-run. You could also play around with fsadm -v --dry-run check/resize ... which I have never used yet. If paranoid you should use losetup --read-only. Don't forget losetup --detach when done.