I know what hard links are, but why would I use them? What is the utility of a hard link?
Filesystems – Why Hard Links Exist
filesystemshard link
Related Solutions
This is just a bad idea, as there is no way to tell the difference between a hard link and an original name.
Allowing hard links to directories would break the directed acyclic graph structure of the filesystem, possibly creating directory loops and dangling directory subtrees, which would make fsck
and any other file tree walkers error prone.
First, to understand this, let's talk about inodes. The data in the filesystem is held in blocks on the disk, and those blocks are collected together by an inode. You can think of the inode as THE file. Inodes lack filenames, though. That's where links come in.
A link is just a pointer to an inode. A directory is an inode that holds links. Each filename in a directory is just a link to an inode. Opening a file in Unix also creates a link, but it's a different type of link (it's not a named link).
A hard link is just an extra directory entry pointing to that inode. When you ls -l
, the number after the permissions is the named link count. Most regular files will have one link. Creating a new hard link to a file will make both filenames point to the same inode. Note:
% ls -l test
ls: test: No such file or directory
% touch test
% ls -l test
-rw-r--r-- 1 danny staff 0 Oct 13 17:58 test
% ln test test2
% ls -l test*
-rw-r--r-- 2 danny staff 0 Oct 13 17:58 test
-rw-r--r-- 2 danny staff 0 Oct 13 17:58 test2
% touch test3
% ls -l test*
-rw-r--r-- 2 danny staff 0 Oct 13 17:58 test
-rw-r--r-- 2 danny staff 0 Oct 13 17:58 test2
-rw-r--r-- 1 danny staff 0 Oct 13 17:59 test3
^
^ this is the link count
Now, you can clearly see that there is no such thing as a hard link. A hard link is the same as a regular name. In the above example, test
or test2
, which is the original file and which is the hard link? By the end, you can't really tell (even by timestamps) because both names point to the same contents, the same inode:
% ls -li test*
14445750 -rw-r--r-- 2 danny staff 0 Oct 13 17:58 test
14445750 -rw-r--r-- 2 danny staff 0 Oct 13 17:58 test2
14445892 -rw-r--r-- 1 danny staff 0 Oct 13 17:59 test3
The -i
flag to ls
shows you inode numbers in the beginning of the line. Note how test
and test2
have the same inode number,
but test3
has a different one.
Now, if you were allowed to do this for directories, two different directories in different points in the filesystem could point to the same thing. In fact, a subdir could point back to its grandparent, creating a loop.
Why is this loop a concern? Because when you are traversing, there is no way to detect you are looping (without keeping track of inode numbers as you traverse). Imagine you are writing the du
command, which needs to recurse through subdirs to find out about disk usage. How would du
know when it hit a loop? It is error prone and a lot of bookkeeping that du
would have to do, just to pull off this simple task.
Symlinks are a whole different beast, in that they are a special type of "file" that many file filesystem APIs tend to automatically follow. Note, a symlink can point to a nonexistent destination, because they point by name, and not directly to an inode. That concept doesn't make sense with hard links, because the mere existence of a "hard link" means the file exists.
So why can du
deal with symlinks easily and not hard links? We were able to see above that hard links are indistinguishable from normal directory entries. Symlinks, however, are special, detectable, and skippable!
du
notices that the symlink is a symlink, and skips it completely!
% ls -l
total 4
drwxr-xr-x 3 danny staff 102 Oct 13 18:14 test1/
lrwxr-xr-x 1 danny staff 5 Oct 13 18:13 test2@ -> test1
% du -ah
242M ./test1/bigfile
242M ./test1
4.0K ./test2
242M .
By default, if you tell tar
to archive a file with hard links, and more than one such link is included among the files to be archived, it archives the file only once, and records the second (and any additional names) as hard links. This means that when you extract that archive, the hard links will be restored.
If you use the --hard-dereference
option, then tar
does not preserve hard links. Instead, it treats them as independent files that just happen to have the same contents and metadata. When you extract the archive, the files will be independent.
Note: It recognizes hard links by first checking the link count of the file. It records the device number and inode of each file with more than one link, and uses that to detect when the same file is being archived again. (When you use --hard-dereference
, it does not do this.)
Best Answer
The main advantage of hard links is that, compared to soft links, there is no size or speed penalty. Soft links are an extra layer of indirection on top of normal file access; the kernel has to dereference the link when you open the file, and this takes a small amount of time. The link also takes a small amount of space on the disk, to hold the text of the link. These penalties do not exist with hard links because they are built into the very structure of the filesystem.
The best way I know of to see this is:
The
-i
option tols
makes it give you the inode number of the file. On the system where I prepared the example above, I happened to be in a directory with inode number 1069765, but the specific value doesn't matter. It's just a unique value that identifies a particular file/directory.What this says is that when we go into a subdirectory and look at a different filesystem entry called
..
, it has the same inode number we got before. This isn't happening because the shell is interpreting..
for you, as happens with MS-DOS and Windows. On Unix filesystems..
is a real directory entry; it is a hard link pointing back to the previous directory.Hard links are the tendons that tie the filesystem's directories together. Once upon a time, Unix didn't have hard links. They were added to turn Unix's original flat file system into a hierarchical filesystem.
(For more on this, see Why does '/' have an '..' entry?.)
It is also somewhat common on Unix systems for several different commands to be implemented by the same executable. It doesn't seem to be the case on Linux any more, but on systems I used in the past,
cp
,mv
andrm
were all the same executable. It makes sense if you think about it: when you move a file between volumes, it is effectively a copy followed by a removal, somv
already had to implement the other two commands' functions. The executable can figure out which operation to provide because it gets passed the name it was called by.Another example, common in embedded Linuxes, is BusyBox, a single executable that implements dozens of commands.
I should point out that on most filesystems, users aren't allowed to make hard links to directories. The
.
and..
entries are automatically managed by the filesystem code, which is typically part of the kernel. The restriction exists because it is possible to cause serious filesystem problems if you aren't careful with how you create and use directory hard links. This is one of many reasons soft links exist; they don't carry the same risk.