Linux – find and symbolic link

command linefindlinuxsymlink

I was skimming over the documentation of find to better utilize the command usage.

I was reading the part that says

GNU find will handle symbolic links in one of two ways; firstly, it
can dereference the links for you – this means that if it comes across
a symbolic link, it examines the file that the link points to, in
order to see if it matches the criteria you have specified. Secondly,
it can check the link itself in case you might be looking for the
actual link. If the file that the symbolic link points to is also
within the directory hierarchy you are searching with the find
command, you may not see a great deal of difference between these two
alternatives.

By default, find examines symbolic links themselves when it finds them
(and, if it later comes across the linked-to file, it will examine
that, too).

To my understanding, if I do something like:

find -L -iname "*foo*"

this will search the current directory recursively and when it encounters a symlink, it follows the link to the original file. If the original file has the name pattern *foo*, the former link is reported.

However, this doesn't seem the case. I have

main-file
sl-file -> main-file

Running the command above find -L -iname "*main*" reports

./main-file

And I was expecting

./main-file # because it matches the criterion
./sl-file   # because the file points to matches the criterion

That being said, using another test like -type works as I am expecting. Say I have this:

main-file
dir/sl-file -> ../main-file

Running this

find dir -type f

returns nothing. But this

find -L dir -type f

reports dir/sl-file.

What gives?

I have gone through this post that says a file name isn't a file property. This is something I can't really get my head around.

Best Answer

Gnu find documentation is not as strict in terminology as the POSIX one. The latter sheds light and I will refer to it. It doesn't define -iname so I will concentrate on -name. I assume -iname is designed to be like -name, only case-insensitive. Therefore I expect all properties of -name that have nothing to do with case to apply to -iname as well.

These are relevant parts of the POSIX documentation:

find [-H|-L] path... [operand_expression...]
The find utility shall recursively descend the directory hierarchy from each file specified by path, […]. Each path operand shall be evaluated unaltered as it was provided, including all trailing <slash> characters; all pathnames for other files encountered in the hierarchy shall consist of the concatenation of the current path operand, a <slash> if the current path operand did not end in one, and the filename relative to the path operand. The relative portion shall contain no dot or dot-dot components, no trailing <slash> characters, and only single <slash> characters between pathname components.

-name pattern
The primary shall evaluate as true if the basename of the current pathname matches pattern using the pattern matching notation […]

-print
[…] it shall cause the current pathname to be written to standard output.

And definitions:

Basename
For pathnames containing at least one filename: the final, or only, filename in the pathname. […]

Filename
A sequence of bytes […] used to name a file. The bytes composing the name shall not contain the <NUL> or <slash> characters. […] A filename is sometimes referred to as a "pathname component". […]

Pathname
A string that is used to identify a file. […] It has optional beginning <slash> characters, followed by zero or more filenames separated by <slash> characters.

So -name is interested in the final filename in the current pathname; and the current pathname is a string that is used to identify the current file. Used by whom? In this case by find. Conceptually a pathname may have nothing to do with names in the filesystem. If find uses some string to identify a file then the string is called "pathname" and -name uses it.

Invoke find . -print or find -L . -print. You will see all pathnames used by this particular invocation of find. Their final filenames are what -name would test if you used -name.


In your example with main-file and sl-file, the command is find -L -iname "*main*". There is implicit -print at the end, the output you observed is from -print. You expected:

./main-file # because it matches the criterion
./sl-file   # because the file points to matches the criterion

But if this was the case, it would mean -print gave you ./main-file and ./sl-file, so these are the exact pathnames, so main-file and sl-file are the respective basenames -name (or -iname) dealt with.

This doesn't fit. Only one of these basenames matches the pattern you used (*main*). This is why you got only one result. Specifying -name "*main*" (or -iname "*main*") and expecting ./sl-file to appear is equivalent to expecting sl-file to match *main*.

It would make some sense to expect ./main-file to appear twice. The premise would be the symlink causes find to change the second pathname from ./sl-file to ./main-file. Then both pathnames would match *main* and both would be printed as ./main-file. This doesn't happen.

If you'd like this to happen, consider a symlink bar pointing to /etc/fstab and placed in /tmp/foo/. We're in the foo directory. What should find -L . print (besides .)? It seems you'd like this pathname to pass -name fstab test, so the basename must be fstab. On the other hand, according to the rules the pathname must begin with ./ (because . is the provided path) and shall contain no dot-dot components. There is no sane and meaningful pathname that can be used. Now what? Fortunately in such case the tool prints just ./bar. This is the pathname and it (as a string) carries no connection to fstab.


Few examples that don't use symlinks but show how -name works:

  1. cd /etc && find . -name . 2>/dev/null

    It finds . despite the fact its "real" (specific, in-the-filesystem) name is etc. It doesn't find subdirectories despite the fact any directory can be . in some circumstances.

  2. cd /etc && find . -name etc 2>/dev/null

    It finds neither etc nor ..

  3. Create an empty FAT32 filesystem and mount it, cd to the mountpoint. The filesystem is case-insensitive and Linux knows it. Create a file named a in the filesystem. Experiment like this:

    • $ find .
      .
      ./a

      In this case the tool must have obtained a from the filesystem at some point.

    • $ find a
      a
      $ find A
      A

      In this case the tool uses a or A taken from its command line argument. The filesystem only confirms such file exists. The filesystem (and the OS) knows this particular file can be referred to as a or A.

    • $ find a -name A
      $ find A -name a

      Nothing! This shows -name doesn't care what the filesystem knows about the file. Only the pathname used by find matters.

      Somewhat similarly in case of your example: -iname doesn't care what the filesystem knows about the symlink and its target. Only the pathname used by find matters.


To clarify and explicitly state what happens, let us go back to your example with the following directory structure:

.
├── main-file
└── sl-file -> main-file

find . -print or find -L . -print prints:

.
./main-file
./sl-file

These are the pathnames, i.e. strings find uses to identify the three files (directory is also a file). The string . comes from the command, the other two were build by examining . (now I mean the file, not the string), learning it is of the type directory, deciding if we should descend (in general think of -prune, -maxdepth if supported), listing its content: main-file, sl-file.

Note the string /.sl-file is built before anything is done to the file identified by it. To do anything with the file, find needs the string.

But -name or -print don't do anything to the file, they don't need its data or metadata. They work with the pathname, the string.

When -name "*main*" is evaluated for any pathname, the corresponding file or the entire filesystem is completely irrelevant. The only relevant thing is the pathname which is a string; more specifically the last component of it, i.e. the basename, also a string.

For any given pathname -name doesn't care if you used -L or if the file is a symlink in the first place, or where it points to, or if it's not broken. It works with the already known string.

On the other hand tests like -type or -mtime need to query the filesystem about the file identified by the pathname. String is not enough for them. In case of a symlink -L decides if they query about the target of the symlink or about the symlink itself. Still, if there is -print involved then it will print the pathname, regardless of what was queried.

In other words:

  • without -L

    • ./main-file string identifies ./main-file file of type f
    • ./sl-file string identifies ./sl-file file of type l
  • with -L

    • ./main-file string identifies ./main-file file of type f
    • ./sl-file string also identifies ./main-file file of type f

Then you should mind which test or action works with pathnames (strings) and which works with files.

-name and -print work with pathnames so find . -name "*main*" with or without -L will only print

./main-file

-type works with files so find . -type f will print one pathname:

./main-file

and find -L . -type f will print two pathnames:

./main-file
./sl-file