I wrote a little bash script to see what happens when I keep following a symbolic link that points to the same directory. I was expecting it to either make a very long working directory, or to crash. But the result surprised me…
mkdir a
cd a
ln -s ./. a
for i in `seq 1 1000`
do
cd a
pwd
done
Some of the output is
${HOME}/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a
${HOME}/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a
${HOME}/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a
${HOME}/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a
${HOME}/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a
${HOME}/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a
${HOME}/a
${HOME}/a/a
${HOME}/a/a/a
${HOME}/a/a/a/a
${HOME}/a/a/a/a/a
${HOME}/a/a/a/a/a/a
${HOME}/a/a/a/a/a/a/a
${HOME}/a/a/a/a/a/a/a/a
what is happening here?
Best Answer
Patrice identified the source of the problem in his answer, but if you want to know how to get from there to why you get that, here's the long story.
The current working directory of a process is nothing you'd think too complicated. It is an attribute of the process which is a handle to a file of type directory where relative paths (in system calls made by the process) start from. When resolving a relative path, the kernel doesn't need to know the (a) full path to that current directory, it just reads the directory entries in that directory file to find the first component of the relative path (and
..
is like any other file in that regard) and continues from there.Now, as a user, you sometimes like to know where that directory lies in the directory tree. With most Unices, the directory tree is a tree, with no loop. That is, there's only one path from the root of the tree (
/
) to any given file. That path is generally called the canonical path.To get the path of the current working directory, what a process has to do is just walk up (well down if you like to see a tree with its root at the bottom) the tree back to the root, finding the names of the nodes on the way.
For instance, a process trying to find out that its current directory is
/a/b/c
, would open the..
directory (relative path, so..
is the entry in the current directory) and look for a file of type directory with the same inode number as.
, find out thatc
matches, then opens../..
and so on until it finds/
. There's no ambiguity there.That's what the
getwd()
orgetcwd()
C functions do or at least used to do.On some systems like modern Linux, there's a system call to return the canonical path to the current directory which does that lookup in kernel space (and allows you to find your current directory even if you don't have read access to all its components), and that's what
getcwd()
calls there. On modern Linux, you can also find the path to the current directory via a readlink() on/proc/self/cwd
.That's what most languages and early shells do when returning the path to the current directory.
In your case, you can call
cd a
as may times as you want, because it's a symlink to.
, the current directory doesn't change so all ofgetcwd()
,pwd -P
,python -c 'import os; print os.getcwd()'
,perl -MPOSIX -le 'print getcwd'
would return your${HOME}
.Now, symlinks went complicating all that.
symlinks
allow jumps in the directory tree. In/a/b/c
, if/a
or/a/b
or/a/b/c
is a symlink, then the canonical path of/a/b/c
would be something completely different. In particular, the..
entry in/a/b/c
is not necessarily/a/b
.In the Bourne shell, if you do:
Or even:
There's no guarantee you'll end up in
/a/b
.Just like:
is not necessarily the same as:
ksh
introduced a concept of a logical current working directory to somehow work around that. People got used to it and POSIX ended up specifying that behaviour which means most shells nowadays do it as well:For the
cd
andpwd
builtin commands (and only for them (though also forpopd
/pushd
on shells that have them)), the shell maintains its own idea of the current working directory. It's stored in the$PWD
special variable.When you do:
even if
c
orc/d
are symlinks, while$PWD
containes/a/b
, it appendsc/d
to the end so$PWD
becomes/a/b/c/d
. And when you do:Instead of doing
chdir("../e")
, it doeschdir("/a/b/c/e")
.And the
pwd
command only returns the content of the$PWD
variable.That's useful in interactive shells because
pwd
outputs a path to the current directory that gives information on how you got there and as long as you only use..
in arguments tocd
and not other commands, it's less likely to surprise you, becausecd a; cd ..
orcd a/..
would generally get you back to where you were.Now,
$PWD
is not modified unless you do acd
. Until the next time you callcd
orpwd
, a lot of things could happen, any of the components of$PWD
could be renamed. The current directory never changes (it's always the same inode, though it could be deleted), but its path in the directory tree could change completely.getcwd()
computes the current directory each time it's called by walking down the directory tree so its information is always accurate, but for the logical directory implemented by POSIX shells, the information in$PWD
might become stale. So upon runningcd
orpwd
, some shells may want to guard against that.In that particular instance, you see different behaviours with different shells.
Some like
ksh93
ignore the problem completely, so will return incorrect information even after you callcd
(and you wouldn't see the behaviour that you're seeing withbash
there).Some like
bash
orzsh
do check that$PWD
is still a path to the current directory uponcd
, but not uponpwd
.pdksh does check upon both
pwd
andcd
(but uponpwd
, does not update$PWD
)ash
(at least the one found on Debian) does not check, and when you docd a
, it actually doescd "$PWD/a"
, so if the current directory has changed and$PWD
no longer points to the current directory, it will actually not change to thea
directory in the current directory, but the one in$PWD
(and return an error if it doesn't exist).If you want to play with it, you can do:
in various shells.
In your case, since you're using
bash
, after acd a
,bash
checks that$PWD
still points to the current directory. To do that, it callsstat()
on the value of$PWD
to check its inode number and compare it with that of.
.But when the looking up of the
$PWD
path involves resolving too many symlinks, thatstat()
returns with an error, so the shell cannot check whether$PWD
still corresponds to the current directory, so it computes it a again withgetcwd()
and updates$PWD
accordingly.Now, to clarify Patrice's answer, that check of number of symlinks encountered while looking up a path is to guard against symlink loops. The simplest loop can be made with
Without that safe guard, upon a
cd a/x
, the system would have to find wherea
links to, finds it'sb
and is a symlink which links toa
, and that would go on indefinitely. The simplest way to guard against that is to give up after resolving more than an arbitrary number of symlinks.Now back to the logical current working directory and why it's not so good a feature. It's important to realise that it's only for
cd
in the shell and not other commands.For instance:
is not always the same as:
That's why you'll sometimes find that people recommend to always use
cd -P
in scripts to avoid confusion (you don't want your software to handle an argument of../x
differently from other commands just because it's written in shell instead of another language).The
-P
option is to disable the logical directory handling socd -P -- "$var"
actually does callchdir()
on the content of$var
(at least as long as$CDPATH
it not set, and except when$var
is-
(or possibly-2
,+3
... in some shells) but that's another story). And after acd -P
,$PWD
will contain a canonical path.