What’s the difference between hard links and copied files

fileshard linkrhel

My understanding is that hard links include a copy of the original file, and that I could delete a hard-linked file in one location, and it would still exist in the other location.

If that's the case, why would I want to use hard links at all? Why not just have two separate files?

Best Answer

If you copy a file, it will duplicate the content. So if you modify the content of a single file, that has no effect on the other one.

If you make a hardlink, that will create a file pointing to the same content. So if you change the content of either of the files, the change will be seen on both.

Related Solutions

Filesystems – Why Hard Links to Directories Are Not Allowed in UNIX/Linux

This is just a bad idea, as there is no way to tell the difference between a hard link and an original name.

Allowing hard links to directories would break the directed acyclic graph structure of the filesystem, possibly creating directory loops and dangling directory subtrees, which would make fsck and any other file tree walkers error prone.

First, to understand this, let's talk about inodes. The data in the filesystem is held in blocks on the disk, and those blocks are collected together by an inode. You can think of the inode as THE file. Inodes lack filenames, though. That's where links come in.

A link is just a pointer to an inode. A directory is an inode that holds links. Each filename in a directory is just a link to an inode. Opening a file in Unix also creates a link, but it's a different type of link (it's not a named link).

A hard link is just an extra directory entry pointing to that inode. When you ls -l, the number after the permissions is the named link count. Most regular files will have one link. Creating a new hard link to a file will make both filenames point to the same inode. Note:

% ls -l test
ls: test: No such file or directory
% touch test
% ls -l test
-rw-r--r--  1 danny  staff  0 Oct 13 17:58 test
% ln test test2
% ls -l test*
-rw-r--r--  2 danny  staff  0 Oct 13 17:58 test
-rw-r--r--  2 danny  staff  0 Oct 13 17:58 test2
% touch test3
% ls -l test*
-rw-r--r--  2 danny  staff  0 Oct 13 17:58 test
-rw-r--r--  2 danny  staff  0 Oct 13 17:58 test2
-rw-r--r--  1 danny  staff  0 Oct 13 17:59 test3
            ^
            ^ this is the link count

Now, you can clearly see that there is no such thing as a hard link. A hard link is the same as a regular name. In the above example, test or test2, which is the original file and which is the hard link? By the end, you can't really tell (even by timestamps) because both names point to the same contents, the same inode:

% ls -li test*  
14445750 -rw-r--r--  2 danny  staff  0 Oct 13 17:58 test
14445750 -rw-r--r--  2 danny  staff  0 Oct 13 17:58 test2
14445892 -rw-r--r--  1 danny  staff  0 Oct 13 17:59 test3

The -i flag to ls shows you inode numbers in the beginning of the line. Note how test and test2 have the same inode number, but test3 has a different one.

Now, if you were allowed to do this for directories, two different directories in different points in the filesystem could point to the same thing. In fact, a subdir could point back to its grandparent, creating a loop.

Why is this loop a concern? Because when you are traversing, there is no way to detect you are looping (without keeping track of inode numbers as you traverse). Imagine you are writing the du command, which needs to recurse through subdirs to find out about disk usage. How would du know when it hit a loop? It is error prone and a lot of bookkeeping that du would have to do, just to pull off this simple task.

Symlinks are a whole different beast, in that they are a special type of "file" that many file filesystem APIs tend to automatically follow. Note, a symlink can point to a nonexistent destination, because they point by name, and not directly to an inode. That concept doesn't make sense with hard links, because the mere existence of a "hard link" means the file exists.

So why can du deal with symlinks easily and not hard links? We were able to see above that hard links are indistinguishable from normal directory entries. Symlinks, however, are special, detectable, and skippable! du notices that the symlink is a symlink, and skips it completely!

% ls -l 
total 4
drwxr-xr-x  3 danny  staff  102 Oct 13 18:14 test1/
lrwxr-xr-x  1 danny  staff    5 Oct 13 18:13 test2@ -> test1
% du -ah
242M    ./test1/bigfile
242M    ./test1
4.0K    ./test2
242M    .

Hard Links – Why Do Hard Links Seem to Take the Same Space as Originals?

A file is an inode with meta data among which a list of pointers to where to find the data.

In order to be able to access a file, you have to link it to a directory (think of directories as phone directories, not folders), that is add one or more entries to one of more directories to associate a name with that file.

All those links, those file names point to the same file. There's not one that is the original and the other ones that are links. They are all access points to the same file (same inode) in the directory tree. When you get the size of the file (lstat system call), you're retrieving information (that metadata referred to above) stored in the inode, it doesn't matter which file name, which link you're using to refer to that file.

By contrast symlinks are another file (another inode) whose content is a path to the target file. Like any other file, those symlinks have to be linked to a directory (must have a name) so you can access them. You can also have several links to a symlinks, or in other words, symlinks can be given several names (in one or more directories).

$ touch a
$ ln a b
$ ln -s a c
$ ln c d
$ ls -li [a-d]
10486707 -rw-r--r-- 2 stephane stephane 0 Aug 27 17:05 a
10486707 -rw-r--r-- 2 stephane stephane 0 Aug 27 17:05 b
10502404 lrwxrwxrwx 2 stephane stephane 1 Aug 27 17:05 c -> a
10502404 lrwxrwxrwx 2 stephane stephane 1 Aug 27 17:05 d -> a

Above the file number 10486707 is a regular file. Two entries in the current directory (one with name a, one with name b) link to it. Because the link count is 2, we know there's no other name of that file in the current directory or any other directory. File number 10502404 is another file, this time of type symlink linked twice to the current directory. Its content (target) is the relative path "a".

Note that if 10502404 was linked to another directory than the current one, it would typically point to a different file depending on how it was accessed.

$ mkdir 1 2
$ echo foo > 1/a
$ echo bar > 2/a
$ ln -s a 1/b
$ ln 1/b 2/b
$ ls -lia 1 2
1:
total 92
10608644 drwxr-xr-x   2 stephane stephane  4096 Aug 27 17:26 ./
10485761 drwxrwxr-x 443 stephane stephane 81920 Aug 27 17:26 ../
10504186 -rw-r--r--   1 stephane stephane     4 Aug 27 17:24 a
10539259 lrwxrwxrwx   2 stephane stephane     1 Aug 27 17:26 b -> a

2:
total 92
10608674 drwxr-xr-x   2 stephane stephane  4096 Aug 27 17:26 ./
10485761 drwxrwxr-x 443 stephane stephane 81920 Aug 27 17:26 ../
10539044 -rw-r--r--   1 stephane stephane     4 Aug 27 17:24 a
10539259 lrwxrwxrwx   2 stephane stephane     1 Aug 27 17:26 b -> a
$ cat 1/b
foo
$ cat 2/b
bar

Files have no names associated with them other than in the directories that link them. The space taken by their names is the entries in those directories, it's accounted for in the file size/disk usage of the directories.

You'll notice that the system call to remove a file is unlink. That is, you don't remove files, you unlink them from the directories they're referenced in. Once unlinked from the last directory that had an entry to a given file, that file is then destroyed (as long as no process has it opened).

Best Answer

Related Solutions

Filesystems – Why Hard Links to Directories Are Not Allowed in UNIX/Linux

Hard Links – Why Do Hard Links Seem to Take the Same Space as Originals?

Related Question