Do hard links really take up so much disk space

aliashard linksymlink

I've found that I need to use hard links with a particular program (Ableton Live) that is unable to see aliases/symlinks, which is of course how I have all my working files organized. But making hard links is creating what appears to be duplicates of the original file.

Do they actually take up as much space as the original? Or is the filesystem (OSX in this case) merely showing the size of the actual data on disk, and the fact that it is being referenced in two places does not actually double the amount of data?

Best Answer

The second thing you said is exactly correct. The file contents only exist once on disk. A hard link is an extra reference, which costs very little space - the size of a directory entry, which is the length of the filename plus a few bytes.

I don't know if this applies to OSX, but in the version of GNU coreutils I have handy, du is aware of hard links, so you can use it to get an accurate report of the total size of a set of files. If it finds multiple links to a file, it only adds it to the total once. ls -l on the other hand does the wrong thing and adds everything it sees in a directory for its total line.

$ ls -sl
total 296
296 -rw-r--r-- 1 user group 300324 Feb 17 19:08 f1
$ du
296     .
$ ln f1 f2
$ ls -sl
total 592
296 -rw-r--r-- 2 user group 300324 Feb 17 19:08 f1
296 -rw-r--r-- 2 user group 300324 Feb 17 19:08 f2
$ du
296     .
$ cp f1 f3
$ ls -sl
total 888
296 -rw-r--r-- 2 user group 300324 Feb 17 19:08 f1
296 -rw-r--r-- 2 user group 300324 Feb 17 19:08 f2
296 -rw-r--r-- 1 user group 300324 Feb 17 19:08 f3
$ du
592     .
$

The ultimate demonstration would be to create a huge file, more than half the size of the disk. Then see how many hard links you can create to it. Should be quite a lot.

Related Solutions

Symlink Hard Link – Difference Between Symbolic and Hard Links

The different semantics between hard and soft links make them suitable for different things.

Hard links:

indistinguishable from other directory entries, because every directory entry is hard link
"original" can be moved or deleted without breaking other hard links to the same inode
only possible within the same filesystem
permissions must be the same as those on the "original" (permissions are stored in the inode, not the directory entry)
can only be made to files, not directories

Symbolic links (soft links)

simply records that point to another file path. (ls -l will show what path a symlink points to)
will break if original is moved or deleted. (In some cases it is actually desirable for a link to point to whatever file currently occupies a particular location)
can point to a file in a different filesystem
can point to a directory
on some file system formats, it is possible for the symlink to have different permissions than the file it points to (this is uncommon)

Hard Links – Why Do Hard Links Seem to Take the Same Space as Originals?

A file is an inode with meta data among which a list of pointers to where to find the data.

In order to be able to access a file, you have to link it to a directory (think of directories as phone directories, not folders), that is add one or more entries to one of more directories to associate a name with that file.

All those links, those file names point to the same file. There's not one that is the original and the other ones that are links. They are all access points to the same file (same inode) in the directory tree. When you get the size of the file (lstat system call), you're retrieving information (that metadata referred to above) stored in the inode, it doesn't matter which file name, which link you're using to refer to that file.

By contrast symlinks are another file (another inode) whose content is a path to the target file. Like any other file, those symlinks have to be linked to a directory (must have a name) so you can access them. You can also have several links to a symlinks, or in other words, symlinks can be given several names (in one or more directories).

$ touch a
$ ln a b
$ ln -s a c
$ ln c d
$ ls -li [a-d]
10486707 -rw-r--r-- 2 stephane stephane 0 Aug 27 17:05 a
10486707 -rw-r--r-- 2 stephane stephane 0 Aug 27 17:05 b
10502404 lrwxrwxrwx 2 stephane stephane 1 Aug 27 17:05 c -> a
10502404 lrwxrwxrwx 2 stephane stephane 1 Aug 27 17:05 d -> a

Above the file number 10486707 is a regular file. Two entries in the current directory (one with name a, one with name b) link to it. Because the link count is 2, we know there's no other name of that file in the current directory or any other directory. File number 10502404 is another file, this time of type symlink linked twice to the current directory. Its content (target) is the relative path "a".

Note that if 10502404 was linked to another directory than the current one, it would typically point to a different file depending on how it was accessed.

$ mkdir 1 2
$ echo foo > 1/a
$ echo bar > 2/a
$ ln -s a 1/b
$ ln 1/b 2/b
$ ls -lia 1 2
1:
total 92
10608644 drwxr-xr-x   2 stephane stephane  4096 Aug 27 17:26 ./
10485761 drwxrwxr-x 443 stephane stephane 81920 Aug 27 17:26 ../
10504186 -rw-r--r--   1 stephane stephane     4 Aug 27 17:24 a
10539259 lrwxrwxrwx   2 stephane stephane     1 Aug 27 17:26 b -> a

2:
total 92
10608674 drwxr-xr-x   2 stephane stephane  4096 Aug 27 17:26 ./
10485761 drwxrwxr-x 443 stephane stephane 81920 Aug 27 17:26 ../
10539044 -rw-r--r--   1 stephane stephane     4 Aug 27 17:24 a
10539259 lrwxrwxrwx   2 stephane stephane     1 Aug 27 17:26 b -> a
$ cat 1/b
foo
$ cat 2/b
bar

Files have no names associated with them other than in the directories that link them. The space taken by their names is the entries in those directories, it's accounted for in the file size/disk usage of the directories.

You'll notice that the system call to remove a file is unlink. That is, you don't remove files, you unlink them from the directories they're referenced in. Once unlinked from the last directory that had an entry to a given file, that file is then destroyed (as long as no process has it opened).

Best Answer

Related Solutions

Symlink Hard Link – Difference Between Symbolic and Hard Links

Hard Links – Why Do Hard Links Seem to Take the Same Space as Originals?

Related Question