Directory Filenames – Path Syntax Rules

directoryfilenames

I'm writing a library for manipulation Unix path strings. That being the case, I need to understand a few obscure corners of the syntax that most people wouldn't worry about.

For example, as best as I can tell, it seems that foo/bar and foo//bar both point to the same place.

Also, ~ usually stands for the user's home directory, but what if it appears in the middle of a path? What happens then?

These and several dozen other obscure questions need answering if I'm going to write code which handles every possible case correctly. Does anybody know of a definitive reference which explains the exact syntax rules for this stuff?

(Unfortunately, searching for terms like "Unix path syntax" just turns up a million pages discussing the $PATH variable… Heck, I'm even struggling to find suitable tags for this question!)

Best Answer

There are three types of paths:

  • relative paths like foo, foo/bar, ../a, .. They don't start with / and are relative to the current directory of the process making a system call with that path.
  • absolute paths like /, /foo/bar or ///x. They start with 1, or 3 or more /, they are not relative, are looked up starting from the / root directory.
  • POSIX allows //foo to be treated specially, but doesn't specify how. Some systems use that for special cases like network files. It has to be exactly 2 slashes.

Other than at the start, sequences of slashes act like one.

~ is only special to the shell, it's expanded by the shell, it's not special to the system at all. How it's expanded is shell dependent. Shells do other forms of expansions like globbing (*.txt) or variable expansion /$foo/$bar or others. As far as the system is concerned ~foo is just a relative path like _foo or foo.

Things to bear in mind:

  • foo/ is not the same as foo. It's closer to foo/. than foo (especially if foo is a symlink) for most system calls on most systems (foo// is the same as foo/ though).
  • a/b/../c is not necessarily the same as a/c (for instance if a/b is a symlink). Best is not to treat .. specially.
  • it's generally safe to consider a/././././b the same as a/b though.
Related Question