Am I correct in the understanding of how symlinks and `..` interact under POSIX

filenamesposixsymlink

I am working on a mathematical description of paths (such as filepaths, but also more abstract and general)

One of the most tricky things to define is the behavour of .. (φ in the linked post); particularly with how it interacts with symlinks.

I want to check that for it that I understand the POSIX rules correctly.

POSIX 4.14 says

When a process resolves a pathname of an existing directory entry, the entire pathname shall be resolved as described below. When a process resolves a pathname of a directory entry that is to be created immediately after the pathname is resolved, pathname resolution terminates when all components of the path prefix of the last component have been resolved. It is then the responsibility of the process to create the final component. …
Each filename in the pathname is located in the directory specified by its predecessor (for example, in the pathname fragment a/b, file b is located in directory a).

If a symbolic link is encountered during pathname resolution, …
the system shall prefix the remaining pathname, if any, with the contents of the symbolic link …
the resolved pathname shall be the resolution of the pathname just created. If the resulting pathname does not begin with a , the predecessor of the first filename of the pathname is taken to be the directory containing the symbolic link.
The special filename dot shall refer to the directory specified by its predecessor. The special filename dot-dot shall refer to the parent directory of its predecessor directory. As a special case, in the root directory, dot-dot may refer to the root directory itself. ..

So what I understand is POSIX says is that symlinks should expanded before resolving .. to go to the parent directory.

Is this correct?

So for normalising paths, the POSIX compliant way is the behaviour of
realpath -P
which is to require existence of all directories (but not the final file component), and to expand resolve symlinks from left to right (as soon as encountered), and then apply ..
Which is to say the file system has to be read at all steps of processing a path.

We can contrast this to the behaviour of node.js's Path.normalize (or even it's Path.posix.normalize) — which I think is fairly normal in many programming langauges (Python2 and 3 os.path are similar).
Which is equivalent to realpath -s -m, which is to say it completely ignores that some directories may be simlinks, or may not exist.
Which is nice since it does not have to touch the file system at all.

If we than take a path that has been normalised this way,
and give it to a function that does touch the file system (Eg fs.readFile)
Then it is equivalent to if we had normalized the path using realpath -L,
which is to "Resolve .. before symlinks"

Bash cd also acts as though it has processed it argument with realpath -L unless you give it the -P flag

Am I correctly understanding the POSIX spec, and its relation to realpath, and to many (most?) programming language path libraries?

Bonus question :-P, anyone got an example of a programming languages/libraries (other than using shell utilities like realpath) that has a POSIX compliant implementation of normalize?
(I believe Python 3 pathlib.Path.resolve is correct. Though it also converts relative paths to absolute paths.)

Best Answer

. and .. on Unixes are real directory entries that represent real files (files or directories) that belong to their parent directory.

.. isn't a path resolution keyword to strip the last segment of the path. In order to resolve a .., you first need to resolve the directory that contains it. Nodejs is doing it wrong.

$ mkdir -p /tmp/foo/bar/baz
$ ln -s /tmp/foo/bar/baz /tmp/symlink
$ realpath /tmp/symlink/..
  /tmp/foo/bar
$ node -e 'console.log(path.normalize("/tmp/symlink/.."))'
  /tmp

(These entries are always created when a directory is created, and they are made to be hardlinks to the current directory and to its parent. They are some of the rare examples of directory hardlinks, as systems won't normally allow regular users to make directory hardlinks.

That's why the hardlink count in ls -ld listings increases for a directory whenever you make a subdirectory, and why directories start at hardlink count 3 (their name, ., and their_name/..). The subdirectory's .. is a hardlink to the current directory and thereby increases its hardlink count. )

Related Question