How do case-insensitive filesystems display both upper and lower case file names

case sensitivityfilenamesfilesystems

This question occurred to me the other day when I was working on a development project that relies on an opinionated framework with regard to file names. The framework (irrelevant here) wanted to see upper-case-first filenames. This got me thinking.

On a case-insensitive file system, say extFAT or HFS+ (specifically non-case sensitive) how does the file system provide access to the same file with both upper and lower case versions of the filename.

For example:

$ cd ~/Documents
$ pwd
/home/derp/Documents

$ cd ../documents
$ pwd
/home/derp/documents

$ cd ../docuMents
$ pwd
/home/derp/docuMents

$ cd ../DOCUMENTS
$ pwd
/home/derp/DOCUMENTS

$ cd ../documentS
$ pwd
/home/derp/documentS

All of these commands will resolve to the same directory. Is this behavior, specifically the output from pwdjust a function of bash in this case just showing me what it thinks I want to see?

Another example:

$ ls ~/Documents
Derp.txt    another.txt    whatThe.WORLD

The filesystem here reports the case of the original filename as created by the user or program.

At what point in the filesystem stack is the human readable filename preserved as it was created (eg. upper and lower case) so that it can be accessed by any combination of the correct upper and lowercase ASCII characters? Is this just a regex trick somewhere or is there something else going on?

EDIT:
It looks like the behavior I am curious about is found in case-preserving case-insensitive filesystems after some more research…

Best Answer

A case-insensitive filesystem just means that whenever the filesystem has to ask "does A refer to the same file/directory as B?" it compares the names of files/directories ignoring differences in upper/lowercase (exactly what upper/lowercase differences count depends on the filesystem—it's non-obvious once you get beyond ASCII). A case-sensitive filesystem does not ignore those differences.

A case-preserving filesystem stores file names as given. A non-case-preserving filesystem does not; it'll typically convert all letters to uppercase before storing them (theoretically, it could use lowercase, or RaNsOm NoTe case, or whatever, but AFAIK all real-world ones used uppercase).

You can put those two attributes together in any combination. I'm not sure if you can find non-case-preserving case-sensitive filesystems, but you could certainly create one. All the other combinations exist or existed in real systems, though.

So a case-preserving, case-insensitive filesystem (the most common type of case-insensitive filesystem nowadays) will store and return file names in whatever capitalization you created them or last renamed them, but when comparing two file names (to check if one exists, to open one, to delete one, etc.) it'll ignore case differences.

When you use a case-insensitive filesystem on a Unix box, various utilities will do weird things because Unix traditionally uses case-sensitive filesystems—so they're not expecting Document1 and document1 to be the same file.

In the pwd case, what you're seeing is that it by default just outputs the path you actually used to get to the directory. So if you got there via cd DirName, it'll use DirName in the output. If you got there via DiRnAmE, you'll see DiRnAmE in the output. Bash does this by keeping track of how you got to your current directory in the $PWD environment variable. Mainly this is for symlinks (if you cd into a symlink, you'll see the symlink in your pwd, even though it's actually not part of the path to your current directory). But it also gives the somewhat weird behavior you observe on case-insensitive filesystems. I suspect that pwd -P will give you the directory name using the case stored on disk, but haven't tested.