How is it possible to list duplicate file names on a Linux system
- ignoring the case
- including all subdirectories
Files should not be compared by their content but only by their names.
The output should be a list of file names including the path, so that one can run further commands on these files.
Lets assume we have
ls -1R /tmp/
foo
BAR
barfoo
a/BAr
a/b/bar
c/bAr
The output of the filter/find script should be
/tmp/BAR
/tmp/a/BAr
/tmp/a/b/bar
/tmp/c/bAr
Best Answer
Specify your choice of starting directory for
find
if you don’t want to start at.
. Add-type f
if you want just file names.find
command produces a list of file (and directory) names, in directory order (i.e., random order, as far as you’re concerned).-printf "%p %f\n"
prints the full pathname (relative to.
) and the filename.sort -f
is short forsort --ignore-case
, i.e., it sorts the filename list in a case-insensitive way-k2
tells it to use the second field as the sort key.uniq -Di -f1
is short foruniq --all-repeated --ignore-case --skip-fields=1
, i.e., it shows (all) the lines of output fromfind
that occur repeatedly, based on case-insensitive comparison of the second field and beyond (i.e., that have the same (case-insensitive) file name).This should give you the output that you want, except each line will have the filename repeated at the end. If you want to get rid of that, pipe into
sed 's/ .*//'
.A couple of gotchas:
If you have directories whose names are the same except for case, and they contain files whose names are the same except for case, e.g.,
then these will be listed.
If you have files (or directories) whose names contain spaces, tabs, or newlines, this will break.