Why is there a mix of symlinks and hardlinks in /bin

hard linksymlinkUtilities

I understand the technical difference between symlinks and hardlinks, this is a question about their use in practice, particularly I'm curious to know why both are used in seemingly similar conditions: the /bin directory.

Here's a fragment its listing on my system:

~$ ls -lai /bin
total 10508
32770 drwxr-xr-x  2 root root    4096 Jun 14 11:47 .
    2 drwxr-xr-x 28 root root    4096 Sep  6 13:15 ..
  119 -rwxr-xr-x  1 root root  959120 Mar 28 22:02 bash
   2820 -rwxr-xr-x  3 root root   31112 Dec 15  2011 bunzip2
  127 -rwxr-xr-x  1 root root 1832016 Nov 16  2012 busybox
   2820 -rwxr-xr-x  3 root root   31112 Dec 15  2011 bzcat
 6191 lrwxrwxrwx  1 root root       6 Dec 15  2011 bzcmp -> bzdiff
 5640 -rwxr-xr-x  1 root root    2140 Dec 15  2011 bzdiff
 5872 lrwxrwxrwx  1 root root       6 Dec 15  2011 bzegrep -> bzgrep
 3520 -rwxr-xr-x  1 root root    4877 Dec 15  2011 bzexe
 6184 lrwxrwxrwx  1 root root       6 Dec 15  2011 bzfgrep -> bzgrep
 5397 -rwxr-xr-x  1 root root    3642 Dec 15  2011 bzgrep
   2820 -rwxr-xr-x  3 root root   31112 Dec 15  2011 bzip2
 2851 -rwxr-xr-x  1 root root   10336 Dec 15  2011 bzip2recover
 6189 lrwxrwxrwx  1 root root       6 Dec 15  2011 bzless -> bzmore
 5606 -rwxr-xr-x  1 root root    1297 Dec 15  2011 bzmore

I indented the hardlinks to the same inode for better visibility.
So are symlinks used in case of bzcmp, bzegrep, bzfgrep, bzless and hardlinks in case of bzip2, bzcat, bunzip2?

They are all regular files (not directories), reside inside one filesystem, are system utilities and are even made for working with the same thing: bzip archives. Are the reasons for use of hardlinks/symlinks in this particular case purely historical or am I missing something?

Clarification of my question:

I'm not asking about:

The technical differences between symlinks and hardlinks
The theoretical advantages and disadvantages each of them

These questions have been addressed in other threads on SO.
I'm trying to understand why different decisions were made in a specific case: for a group of related system utilities. Technically, they all could've been symlinks or they all could've been hardlinks, both options would work (and in both cases a program can still figure out how it's been invoked via argv[0]). I want to understand the intent here if there is any.

Related:

Why do hard links exist?

Best Answer

Why use hardlinks vs. Symbolic links

There are primarily 3 advantages of using hardlinks over symbolic links in this scenario.

Hard links

With a hard link, the link points to the inode directly.
Hard links are like having multiple copies of the executable but only using the disk space of one.
You can rename either branch of the hard link without breaking anything.

Symbolic links

The link points to the object (which then in-turn points to the inode).
They can span filesystems, whereas hardlinks cannot.

Advantages of linking in general

These links exist because many executables behave differently based on how they were called. For example the 2 commands bzless and bzmore are actually a single executable, bzmore. The executable will behave differently depending on which names was used to invoke it.

This is done for a variety of reasons. Here are some of the more obvious ones:

Easier to develop a single executable rather than many
Saves disk space
Easier to deploy

Why are both being used?

The choice of either, in this particular application, is moot. Either can facilitate the feature of acting as an alias so that a single executable can be overloaded. That's really the key feature that is getting exploited by the developers of the various programs here.

In looking at the FHS (Filesystem Hierarchy Standard) even specifies it this way, that it can be either.

excerpt

If /bin/sh is not a true Bourne shell, it must be a hard or symbolic link to the real shell command.

The rationale behind this is because sh and bash mightn't necessarily behave in the same manner. The use of a symbolic link also allows users to easily see that /bin/sh is not a true Bourne shell.

...

...

If the gunzip and zcat programs exist, they must be symbolic or hard links to gzip. /bin/csh may be a symbolic link to /bin/tcsh or /usr/bin/tcsh.

References

Why are reboot, shutdown and poweroff symlinks to systemctl?

Related Solutions

Hardlink Limit for One File – Is There a Maximum?

Posix requires that the operating system understand the concept of hard links but not that hard links can actually be used in any particular circumstance. You can find out how many hard links are permitted at a particular location (this can vary by filesystem type) by calling pathconf(filename, _PC_LINK_MAX). The minimum limit (_POSIX_LINK_MAX) is 8, but this is rather meaningless as link() can report many other errors anyway (permission denied, disk full, …).

The stat structure stores the link count in a field of type nlink_t, so the type of this field gives an upper limit on your system. But there's a good chance you'll never be able to reach that far: it's common to have a 32-bit nlink_t but only 16 bits in many filesystems (a quick grep in the Linux source shows that ext[234], NTFS, UFS and XFS use 16-bit link counts in the kernel data structures).

Prevent Perl -i from Clobbering Symlinks – How to Guide

I wonder whether the small sponge general-purpose utility ("soak up standard input and write to a file") from moreutils will be helpful in this case and whether it will follow the symlink.

The author describes sponge like this:

It addresses the problem of editing files in-place with Unix tools, namely that if you just redirect output to the file you're trying to edit then the redirection takes effect (clobbering the contents of the file) before the first command in the pipeline gets round to reading from the file. Switches like sed -i and perl -i work around this, but not every command you might want to use in a pipeline has such an option, and you can't use that approach with multiple-command pipelines anyway.

I normally use sponge a bit like this:
sed '...' file | grep '...' | sponge file