What you're asking for doesn't make much sense in the general case, so it's not surprising that find
has no provision for it.
A symlink with a relative target is relative to the path of the symlink. So for instance, if by traversing a directory by following symlinks, find
encounters a/b/c/d
and a
, a/b
, a/b/c
are all relative or absolute symlinks (or symlinks to paths with symlink components), what should it do?
If you're looking for a find
predicate or a GNU -printf
%
directive that expands to a symlink-free path to the file relative to the current directory or any directory, I'm afraid there's none.
If you're on Linux, you can get the absolute path of those files with:
find -L foo -type f -exec readlink -f {} \;
As you found out, there exists at least one realpath
command which accepts more than one path argument which in combination with the standard -exec cmd {} +
syntax is going to be a lot more efficient since it's running as few realpath commands as necessary:
find -L foo -type f -exec realpath {} +
find -L foo -type f -print0 | xargs -r0 realpath
might be quicker as if more than one realpath
command is needed, find
can keep on looking for more files while the first realpath
starts working which even on a single processor system might make it more efficient.
-print0
and xargs -r0
are not standard, come from GNU but are found in a number of other implementations like most modern BSDs.
Zsh has builtin support for it:
print -rl foo/***/*(-.:A)
If you don't care about the sorting order, you can disable sorting and make it a bit more efficient with:
print -rl foo/***/*(-.oN:A)
If you want to convert those to relative paths to the current directory, you could have a look at that SO question.
If you know that all those files have an absolute canonical path (whose none of the components are symlinks) inside the current directory, you can simplify it to (still with zsh
):
files=(foo/***/*(-.:A))
print -rl -- ${files#$PWD/}
Though short and convenient, and works whatever character filenames contain, I doubt it would faster than find
+ realpath
.
With the Debian realpath
and GNU tools, you can do:
cd -P .
find -L foo -type f -exec realpath -z {} + |
gawk -v p="$PWD" -v l="${#PWD}" -v RS='\0' -vORS='\0' '
substr($0, 1, l+1) == p "/" {$0 = substr($0, l+2)}; 1' |
xargs -r0 whatever you want to do with them
As I realise now, there's now a realpath
in recent versions of GNU coreutils, which has the exact feature you're looking for, so it's just a matter of
find -L foo -type f -print0 |
xargs -r0 realpath -z --relative-base . |
xargs -r0 whatever you want to do with them
(use --relative-to .
instead of --relative-base .
if you want relative paths even for files whose symlink free path doesn't reside below the current working directory).
Why use hardlinks vs. Symbolic links
There are primarily 3 advantages of using hardlinks over symbolic links in this scenario.
Hard links
- With a hard link, the link points to the inode directly.
- Hard links are like having multiple copies of the executable but only using the disk space of one.
- You can rename either branch of the hard link without breaking anything.
Symbolic links
- The link points to the object (which then in-turn points to the inode).
- They can span filesystems, whereas hardlinks cannot.
Advantages of linking in general
These links exist because many executables behave differently based on how they were called. For example the 2 commands bzless
and bzmore
are actually a single executable, bzmore
. The executable will behave differently depending on which names was used to invoke it.
This is done for a variety of reasons. Here are some of the more obvious ones:
- Easier to develop a single executable rather than many
- Saves disk space
- Easier to deploy
Why are both being used?
The choice of either, in this particular application, is moot. Either can facilitate the feature of acting as an alias so that a single executable can be overloaded. That's really the key feature that is getting exploited by the developers of the various programs here.
In looking at the FHS (Filesystem Hierarchy Standard) even specifies it this way, that it can be either.
excerpt
If /bin/sh is not a true Bourne shell, it must be a hard or symbolic
link to the real shell command.
The rationale behind this is because sh and bash mightn't necessarily
behave in the same manner. The use of a symbolic link also allows
users to easily see that /bin/sh is not a true Bourne shell.
...
...
If the gunzip and zcat programs exist, they must be symbolic or hard
links to gzip. /bin/csh may be a symbolic link to /bin/tcsh or
/usr/bin/tcsh.
References
Best Answer
First; Is there a reason you need to use symlinks and not the usual hardlinks? I am having a hard time understanding the need for symlinks with relative paths. Here is how I would solve this problem:
I think the Debian (Ubuntu) version of fdupes can replace duplicates with hard links using the
-L
option, but I don't have a Debian installation to verify this.If you do not have a version with the
-L
option you can use this tiny bash script I found on commandlinefu.Note that this syntax will only work in bash.
The above command will find all duplicate files in "path" and replace them with hardlinks. You can verify this by running
ls -ilR
and looking at the inode number. Here is a samle with ten identical files:All the files have separate inode numbers, making them separate files. Now lets deduplicate them:
The files now all have the same inode number, meaning they all point to the same physical data on disk.
I hope this solves your problem or at least points you in the right direction!