Are there widespread filesystems which represent directories with structures optimized for fast lookup

directoryfilesystems

In "The Art of Unix Programming", on the topic of The Terminfo Database I read:

If you look in the terminfo directory, you'll see subdirectories named
by single printable characters. Under each of these are the entries
for each terminal type that has a name beginning with that letter. The
goal of this organization was to avoid having to do a linear search of
a very large directory; under more modern Unix file systems, which
represent directories with B-trees or other structures optimized for
fast lookup, the subdirectories won't be necessary.

I wonder if there are widespread (i.e. production ready) filesystems with this quality.

Best Answer

There are several, e.g. ext4, Microsoft's NTFS, Apple's HDF+, or the up and coming btrfs that use B-Trees. There are also HDF and Reiser4 which use B*-Trees, a more densely packed version of B-Tree.

Related Solutions

Unix Filesystems – How Directories Are Implemented

The internal structure of directories is dependent on the filesystem in use. If you want to know precisely what happens, have a look at filesystem implementations.

Basically, in most filesystems, a directory is an associative array between filenames (keys) and inodes numbers (values). Something like this¹:

1167010 .
1158721 ..
1167626 subdir
 132651 barfile
 132650 bazfile

This list is coded in some – more or less – efficient way inside a chain of (usually) 4KB blocks. Notice that the content of regular files is stored similarly. In the case of directories, there is no point in knowing which size is actually used inside these blocks. That's why the sizes of directories reported by du are multiples of 4KB.

Inodes are there to tie blocks together, forming a single entity, namely a 'file' in the general sense. They are identified by a number which is some kind of address and each one is usually stored as a single, special block.

Management of all this happens in kernel mode. Software just asks for the creation of a directory with a function named int mkdir(const char *pathname, mode_t mode); leading to a system call, and all the rest is performed behind the scenes.

About links structure:

A hard link is not a file, it's just a new directory entry (i.e. a name – inode number association) referring to a preexisting inode entity². This means that the same inode can be accessed from different pathnames. In particular, since metadatas (permissions, ownership, timestamps…) are stored within the inode, these are unique and independent of the pathname chosen to access the file.

A symbolic link is a file and it's distinct from its target. This means that it has its own inode. It used to be handled just like a regular file: the target path was stored in a data block. But now, for efficiency reasons in recent ext filesystems, paths shorter than 60 bytes long are stored within the inode itself (using the fields which would normally be used to store the pointers to data blocks).

—
^{1. this was obtained using ls -ai1 testdir.}
^{2. whose type must be different than 'directory' nowadays.}

Linux – Are there any filesystems for which `ln -d` succeeds

First a note: the ln command does not have options like -d, -F, --directory, this is a non-portable GNUism.

The feature you are looking for, is implemented by the link(1)command.

Back to your original question:

On a typical UNIX system the decision, whether hard links on directories are possible, is made in the filesystem driver.

The Solaris UFS driver supports hard links on directories, the ZFS driver does not.

The reason why UFS on Solaris supports hard links is that AT&T was interested in this feature - UFS from BSD does not support hard linked directories.

The reason why ZFS does not support hardlinked directories is that Jeff Bonwick does not like that feature.

Regarding Linux, I would guess that Linux blocks attempts to created hard links on directories in the upper kernel layers. The reason for this assumption is that Linus Torvalds wrote code for GIT that did shred directories when git clone was called as root on a platform that supports hard linked directories.

Note that a filesystem that supports to create hard linked directories also needs to support unlink(1) to remove non-empty directories as root.

So if we assume that Torvalds knows how Linux works and if Linux did support hard linked directories, Torvalds should have known that calling unlink(2) on a directory while being root, will not return with an error but shred that directory. IN other words, it is unlikely that Linux permits a file system driver to implement hard linked directories.

Best Answer

Related Solutions

Unix Filesystems – How Directories Are Implemented

Linux – Are there any filesystems for which `ln -d` succeeds

Related Question