Linux: How does hard-linking to a directory work

hard linkinodelinux

I'm aware that Linux does not allow hard-linking to a directory. I read somewhere,

  1. that this is to prevent unintentional loops (or graphs, instead of the more desirable tree structure) in the file-system.

  2. that some *nix systems do allow the root user to hard-link to directories.

So, if we are on one such system (that does allow hard-linking to a directory) and if we are the root user, then how is the parent directory entry, .., handled following the deletion of the (hard-link's) target and its parent?

a (200)
\-- .  (200)
\-- .. (100)
\-- b  (300)
|   \-- .  (300)
|   \-- .. (200)
|   \-- c  (400)
|       \-- .  (400)
|       \-- .. (300)
|       \-- d  (500)

 <snip>

|
\-- H (400)

(In the above figure, the numbers in the parentheses are the inode addresses.)

If a/H is an (attempted) hard-link to the directory a/b/c, then

  1. What should be the reference count stored in the inode 400: 2, 3, or 4? In other words, does hard-linking to a directory increases the reference count of the target directory's inode by 1 or by 2?

  2. If we delete a/b/c, the . and .. entries in inode 400 continue to point to valid inodes 400 and 300, respectively. But what happens to the reference count stored in inode 400 if the directory tree a/b is recursively deleted?

Even if the inode 400 could be kept intact via a non-zero reference count (of either 1 or 2 – see the preceding question) in it, the inode address corresponding to .. inside inode 400 would still become invalid!

Thus, after the directory tree b stands deleted, if the user changes into the a/H directory and then does a cd .. from there, what is supposed to happen?

Note: If the default file-system on Linux (ext4) does not allow hard-linking to directories even by a root user, then I'd still be interested in knowing the answer to the above question for an inode-based file-system that does allow this feature.

Best Answer

Hard links to directories aren't fundamentally different to hard links for files. In fact, many filesystems do have hard links on directories, but only in a very disciplined way.

In a filesystem that doesn't allow users to create hard links to directories, a directory's links are exactly

  1. the . entry in the directory itself;
  2. the .. entries in all the directories that have this directory as their parent;
  3. one entry in the directory that .. points to.

An additional constraint in such filesystems is that from any directory, following .. nodes must eventually lead to the root. This ensures that the filesystem is presented as a single tree. This constraint is violated on filesystems that allow hard links to directories.

Filesystems that allow hard links to directories allow more cases than the three above. However they maintain the constraint that these cases do exist: a directory's . always exists and points to itself; a directory's .. always points to a directory that has it as an entry. Unlinking a directory entry that is a directory only removes it if it contains no entry other than . and ...

Thus a dangling .. cannot happen. What can go wrong is that a part of the filesystem can become detached. If a directory's .. pointing to one of its descendants, so that ../../../.. eventually forms a loop. (As seen above, filesystems that don't allow hard link manipulations prevent this.) If all the paths from the root to such a directory are unlinked, the part of the filesystem containing this directory cannot be reached anymore, unless there are processes that still have their current directory on it. That part can't even be deleted since there's no way to get at it.

GCFS allows directory hard links and runs a garbage collector to delete such detached parts of the filesystem. You should read its specification, which addresses your concerns in details. This is an interesting intellectual exercise, but I don't know of any filesystem that's used in practice that provides garbage collection.

Related Question