Finding duplicate files by name ignoring the case in subdirectories

case sensitivityduplicatefilenamesfind

How is it possible to list duplicate file names on a Linux system

  • ignoring the case
  • including all subdirectories

Files should not be compared by their content but only by their names.
The output should be a list of file names including the path, so that one can run further commands on these files.

Lets assume we have

ls -1R /tmp/
foo
BAR
barfoo
a/BAr
a/b/bar
c/bAr

The output of the filter/find script should be

/tmp/BAR
/tmp/a/BAr
/tmp/a/b/bar
/tmp/c/bAr

Best Answer

find . -printf "%p %f\n" | sort -f -k2 | uniq -Di -f1

Specify your choice of starting directory for find if you don’t want to start at ..  Add -type f if you want just file names. 

  • The find command produces a list of file (and directory) names, in directory order (i.e., random order, as far as you’re concerned).
  • -printf "%p %f\n" prints the full pathname (relative to .) and the filename. 
  • sort -f is short for sort --ignore-case, i.e., it sorts the filename list in a case-insensitive way
  • -k2 tells it to use the second field as the sort key. 
  • uniq -Di -f1 is short for uniq --all-repeated --ignore-case --skip-fields=1, i.e., it shows (all) the lines of output from find that occur repeatedly, based on case-insensitive comparison of the second field and beyond (i.e., that have the same (case-insensitive) file name).

This should give you the output that you want, except each line will have the filename repeated at the end.  If you want to get rid of that, pipe into sed 's/ .*//'.

A couple of gotchas:

  • If you have directories whose names are the same except for case, and they contain files whose names are the same except for case, e.g.,

    documents/design.doc
    Documents/Design.doc
    

    then these will be listed.

  • If you have files (or directories) whose names contain spaces, tabs, or newlines, this will break.