linux – Why Case-Insensitive Option in ext4 Was Needed

ext4filesystemslinux

I was reading about Linux 5.2 patch note released at last year, I noticed that they started to optional support for case-insensitive names in ext4 file system.

So… what I am wondering is the reason why the case-insensitive option (including casefold and normalization) was needed in the kernel. I could find out another article written by Krisman who wrote the kernel code for supporting case-folding file system, but case-insensitive file system allows us to resolve important bottlenecks for applications being ported from other operating systems does not reach my heart and I cannot understand how the process of normalization and casefolding allow us to optimize our disk storage.

I appreciate so much for your help!

Best Answer

case-insensitive file system allows us to resolve important bottlenecks for applications being ported from other operating systems

does not reach my heart and I cannot understand how the process of normalization and casefolding allow us to optimize our disk storage.

Wine, Samba, and Android have to provide case-insensitive filesystem semantics. If the underlying filesystem is case-sensitive, every time a case-sensitive lookup fails, Wine et al. has to scan each directory to prove there are no case-insensitive matches (e.g. if looking up /foo/bar/readme.txt fails, you have to perform a full directory listing and case-folded comparison of all files in foo/bar/* and all directories in foo/*, and /*).

There are a few problems with this:

  • It can get very slow with deeply nested paths (which can generate hundreds of FS calls) or directories with tens of thousands of files (i.e. storing incremental backups over SMB).
  • These checks introduce race conditions.
  • It's fundamentally unsound: if both readme.txt and README.txt exist but an application asks for README.TXT, which file is returned is undefined.

Android went so far as to emulate case-insensitivity using FUSE/wrapfs and then the in-kernel SDCardFS. However, SDCardFS just made everything faster by moving the process into kenel space†. It still had to walk the filesystem (and was thus IO bound), introduced race conditions, and was fundamentally unsound. Hence why Google funded† development of native per-directory case-insensitivity in F2FS and have since deprecated SDCardFS.

There have been multiple attempts in the past to enable case-insensitive lookups via VFS. The most recent attempt in 2018 allowed mounting a case-insensitive view of the filesystem. Ted Tso specifically cited the issues with wrapfs for adding this functionality, as it would at least be faster and (I believe) free of race conditions. However, it was still unsound (requesting README.TXT could return readme.txt or README.txt). This was rejected in favor of just adding per-directory support for case-insensitivity and is unlikely to ever make it into VFS††.

Furthermore, users expect case-insensitivity thus any consumer oriented operating system has to provide it. Unix couldn't supported it natively because Unicode didn't exist and strings were just bags-of-bytes. There are plenty of valid criticisms of how case-folding was handled in the past, but Unicode provides an immutable case-fold function that works for all but a single locale (Turkic, and even then it's just two codepoints). And the filesystem b-tree is the only reasonable place to implement this behavior.

AFAICT
††I emailed Krisman, the author of both the VFS-based case-insensitive lookups and per-directory case-insensitive support on EXT4 and F2FS.

Related Question