Filesystems – Why Hard Links Exist

filesystemshard link

I know what hard links are, but why would I use them? What is the utility of a hard link?

Best Answer

The main advantage of hard links is that, compared to soft links, there is no size or speed penalty. Soft links are an extra layer of indirection on top of normal file access; the kernel has to dereference the link when you open the file, and this takes a small amount of time. The link also takes a small amount of space on the disk, to hold the text of the link. These penalties do not exist with hard links because they are built into the very structure of the filesystem.

The best way I know of to see this is:

$ ls -id .
1069765 ./
$ mkdir tmp ; cd tmp
$ ls -id ..
1069765 ../

The -i option to ls makes it give you the inode number of the file. On the system where I prepared the example above, I happened to be in a directory with inode number 1069765, but the specific value doesn't matter. It's just a unique value that identifies a particular file/directory.

What this says is that when we go into a subdirectory and look at a different filesystem entry called .., it has the same inode number we got before. This isn't happening because the shell is interpreting .. for you, as happens with MS-DOS and Windows. On Unix filesystems .. is a real directory entry; it is a hard link pointing back to the previous directory.

Hard links are the tendons that tie the filesystem's directories together. Once upon a time, Unix didn't have hard links. They were added to turn Unix's original flat file system into a hierarchical filesystem.

(For more on this, see Why does '/' have an '..' entry?.)

It is also somewhat common on Unix systems for several different commands to be implemented by the same executable. It doesn't seem to be the case on Linux any more, but on systems I used in the past, cp, mv and rm were all the same executable. It makes sense if you think about it: when you move a file between volumes, it is effectively a copy followed by a removal, so mv already had to implement the other two commands' functions. The executable can figure out which operation to provide because it gets passed the name it was called by.

Another example, common in embedded Linuxes, is BusyBox, a single executable that implements dozens of commands.

I should point out that on most filesystems, users aren't allowed to make hard links to directories. The . and .. entries are automatically managed by the filesystem code, which is typically part of the kernel. The restriction exists because it is possible to cause serious filesystem problems if you aren't careful with how you create and use directory hard links. This is one of many reasons soft links exist; they don't carry the same risk.

Related Solutions

Filesystems – Why Hard Links to Directories Are Not Allowed in UNIX/Linux

This is just a bad idea, as there is no way to tell the difference between a hard link and an original name.

Allowing hard links to directories would break the directed acyclic graph structure of the filesystem, possibly creating directory loops and dangling directory subtrees, which would make fsck and any other file tree walkers error prone.

First, to understand this, let's talk about inodes. The data in the filesystem is held in blocks on the disk, and those blocks are collected together by an inode. You can think of the inode as THE file. Inodes lack filenames, though. That's where links come in.

A link is just a pointer to an inode. A directory is an inode that holds links. Each filename in a directory is just a link to an inode. Opening a file in Unix also creates a link, but it's a different type of link (it's not a named link).

A hard link is just an extra directory entry pointing to that inode. When you ls -l, the number after the permissions is the named link count. Most regular files will have one link. Creating a new hard link to a file will make both filenames point to the same inode. Note:

% ls -l test
ls: test: No such file or directory
% touch test
% ls -l test
-rw-r--r--  1 danny  staff  0 Oct 13 17:58 test
% ln test test2
% ls -l test*
-rw-r--r--  2 danny  staff  0 Oct 13 17:58 test
-rw-r--r--  2 danny  staff  0 Oct 13 17:58 test2
% touch test3
% ls -l test*
-rw-r--r--  2 danny  staff  0 Oct 13 17:58 test
-rw-r--r--  2 danny  staff  0 Oct 13 17:58 test2
-rw-r--r--  1 danny  staff  0 Oct 13 17:59 test3
            ^
            ^ this is the link count

Now, you can clearly see that there is no such thing as a hard link. A hard link is the same as a regular name. In the above example, test or test2, which is the original file and which is the hard link? By the end, you can't really tell (even by timestamps) because both names point to the same contents, the same inode:

% ls -li test*  
14445750 -rw-r--r--  2 danny  staff  0 Oct 13 17:58 test
14445750 -rw-r--r--  2 danny  staff  0 Oct 13 17:58 test2
14445892 -rw-r--r--  1 danny  staff  0 Oct 13 17:59 test3

The -i flag to ls shows you inode numbers in the beginning of the line. Note how test and test2 have the same inode number, but test3 has a different one.

Now, if you were allowed to do this for directories, two different directories in different points in the filesystem could point to the same thing. In fact, a subdir could point back to its grandparent, creating a loop.

Why is this loop a concern? Because when you are traversing, there is no way to detect you are looping (without keeping track of inode numbers as you traverse). Imagine you are writing the du command, which needs to recurse through subdirs to find out about disk usage. How would du know when it hit a loop? It is error prone and a lot of bookkeeping that du would have to do, just to pull off this simple task.

Symlinks are a whole different beast, in that they are a special type of "file" that many file filesystem APIs tend to automatically follow. Note, a symlink can point to a nonexistent destination, because they point by name, and not directly to an inode. That concept doesn't make sense with hard links, because the mere existence of a "hard link" means the file exists.

So why can du deal with symlinks easily and not hard links? We were able to see above that hard links are indistinguishable from normal directory entries. Symlinks, however, are special, detectable, and skippable! du notices that the symlink is a symlink, and skips it completely!

% ls -l 
total 4
drwxr-xr-x  3 danny  staff  102 Oct 13 18:14 test1/
lrwxr-xr-x  1 danny  staff    5 Oct 13 18:13 test2@ -> test1
% du -ah
242M    ./test1/bigfile
242M    ./test1
4.0K    ./test2
242M    .

Hard Links – How to Dereference Hard Links

By default, if you tell tar to archive a file with hard links, and more than one such link is included among the files to be archived, it archives the file only once, and records the second (and any additional names) as hard links. This means that when you extract that archive, the hard links will be restored.

If you use the --hard-dereference option, then tar does not preserve hard links. Instead, it treats them as independent files that just happen to have the same contents and metadata. When you extract the archive, the files will be independent.

Note: It recognizes hard links by first checking the link count of the file. It records the device number and inode of each file with more than one link, and uses that to detect when the same file is being archived again. (When you use --hard-dereference, it does not do this.)

Best Answer

Related Solutions

Filesystems – Why Hard Links to Directories Are Not Allowed in UNIX/Linux

Hard Links – How to Dereference Hard Links

Related Question