How do case-insensitive filesystems display both upper and lower case file names

case sensitivityfilenamesfilesystems

This question occurred to me the other day when I was working on a development project that relies on an opinionated framework with regard to file names. The framework (irrelevant here) wanted to see upper-case-first filenames. This got me thinking.

On a case-insensitive file system, say extFAT or HFS+ (specifically non-case sensitive) how does the file system provide access to the same file with both upper and lower case versions of the filename.

For example:

$ cd ~/Documents
$ pwd
/home/derp/Documents

$ cd ../documents
$ pwd
/home/derp/documents

$ cd ../docuMents
$ pwd
/home/derp/docuMents

$ cd ../DOCUMENTS
$ pwd
/home/derp/DOCUMENTS

$ cd ../documentS
$ pwd
/home/derp/documentS

All of these commands will resolve to the same directory. Is this behavior, specifically the output from pwdjust a function of bash in this case just showing me what it thinks I want to see?

Another example:

$ ls ~/Documents
Derp.txt    another.txt    whatThe.WORLD

The filesystem here reports the case of the original filename as created by the user or program.

At what point in the filesystem stack is the human readable filename preserved as it was created (eg. upper and lower case) so that it can be accessed by any combination of the correct upper and lowercase ASCII characters? Is this just a regex trick somewhere or is there something else going on?

EDIT:
It looks like the behavior I am curious about is found in case-preserving case-insensitive filesystems after some more research…

Best Answer

A case-insensitive filesystem just means that whenever the filesystem has to ask "does A refer to the same file/directory as B?" it compares the names of files/directories ignoring differences in upper/lowercase (exactly what upper/lowercase differences count depends on the filesystem—it's non-obvious once you get beyond ASCII). A case-sensitive filesystem does not ignore those differences.

A case-preserving filesystem stores file names as given. A non-case-preserving filesystem does not; it'll typically convert all letters to uppercase before storing them (theoretically, it could use lowercase, or RaNsOm NoTe case, or whatever, but AFAIK all real-world ones used uppercase).

You can put those two attributes together in any combination. I'm not sure if you can find non-case-preserving case-sensitive filesystems, but you could certainly create one. All the other combinations exist or existed in real systems, though.

So a case-preserving, case-insensitive filesystem (the most common type of case-insensitive filesystem nowadays) will store and return file names in whatever capitalization you created them or last renamed them, but when comparing two file names (to check if one exists, to open one, to delete one, etc.) it'll ignore case differences.

When you use a case-insensitive filesystem on a Unix box, various utilities will do weird things because Unix traditionally uses case-sensitive filesystems—so they're not expecting Document1 and document1 to be the same file.

In the pwd case, what you're seeing is that it by default just outputs the path you actually used to get to the directory. So if you got there via cd DirName, it'll use DirName in the output. If you got there via DiRnAmE, you'll see DiRnAmE in the output. Bash does this by keeping track of how you got to your current directory in the $PWD environment variable. Mainly this is for symlinks (if you cd into a symlink, you'll see the symlink in your pwd, even though it's actually not part of the path to your current directory). But it also gives the somewhat weird behavior you observe on case-insensitive filesystems. I suspect that pwd -P will give you the directory name using the case stored on disk, but haven't tested.

Related Solutions

Shell – change entire directory tree to lower-case names

I don't know whether your unix-flavor has a rename. Many Linuxes have, and it is part of a perl-package, if you search for a repository.

find ./ -depth -exec rename -n 'y/[A-Z]/[a-z]/' {} ";"

Above version with

rename -n

doesn't really perform the action, but only print what would be done. You omit the -n to do it for real.

Case-insensitive search of duplicate file-names

If you have GNU utilities (or at least a set that can deal with zero-terminated lines) available, another answer has a great method:

find . -maxdepth 1 -print0 | sort -z | uniq -diz

Note: the output will have zero-terminated strings; the tool you use to further process it should be able to handle that.

In the absence of tools that deal with zero-terminated lines, or if you want to make sure your code works in environments where such tools are not available, you need a small script:

#!/bin/sh
for f in *; do
  find . -maxdepth 1 -iname ./"$f" -exec echo \; | wc -l | while read count; do
    [ $count -gt 1 ] && echo $f
  done
done

What is this madness? See this answer for an explanation of the techniques that make this safe for crazy filenames.

Best Answer

Related Solutions

Shell – change entire directory tree to lower-case names

Case-insensitive search of duplicate file-names

Related Question