Finding duplicate files by name ignoring the case in subdirectories

case sensitivityduplicatefilenamesfind

How is it possible to list duplicate file names on a Linux system

ignoring the case
including all subdirectories

Files should not be compared by their content but only by their names.
The output should be a list of file names including the path, so that one can run further commands on these files.

Lets assume we have

ls -1R /tmp/
foo
BAR
barfoo
a/BAr
a/b/bar
c/bAr

The output of the filter/find script should be

/tmp/BAR
/tmp/a/BAr
/tmp/a/b/bar
/tmp/c/bAr

Best Answer

find . -printf "%p %f\n" | sort -f -k2 | uniq -Di -f1

Specify your choice of starting directory for find if you don’t want to start at .. Add -type f if you want just file names.

The find command produces a list of file (and directory) names, in directory order (i.e., random order, as far as you’re concerned).
-printf "%p %f\n" prints the full pathname (relative to .) and the filename.
sort -f is short for sort --ignore-case, i.e., it sorts the filename list in a case-insensitive way
-k2 tells it to use the second field as the sort key.
uniq -Di -f1 is short for uniq --all-repeated --ignore-case --skip-fields=1, i.e., it shows (all) the lines of output from find that occur repeatedly, based on case-insensitive comparison of the second field and beyond (i.e., that have the same (case-insensitive) file name).

This should give you the output that you want, except each line will have the filename repeated at the end. If you want to get rid of that, pipe into sed 's/ .*//'.

A couple of gotchas:

If you have directories whose names are the same except for case, and they contain files whose names are the same except for case, e.g.,
```
documents/design.doc
Documents/Design.doc
```
then these will be listed.
If you have files (or directories) whose names contain spaces, tabs, or newlines, this will break.

Related Solutions

cat command – Display File Names with Contents

$ for file in ./tmp/*.txt; do echo "$file";  cat "$file"; done

-or-

$ find ./tmp -maxdepth 1 -name "*.txt" -print -exec cat "{}" \;

How to Match Case Insensitive Patterns with ls

This is actually done by your shell, not by ls.

In bash, you'd use:

shopt -s nocaseglob

and then run your command.

Or in zsh:

unsetopt CASE_GLOB

Or in yash:

set +o case-glob

and then your command.

You might want to put that into .bashrc, .zshrc or .yashrc, respectively.

Alternatively, with zsh:

setopt extendedglob
ls -d -- (#i)*abc*

(that is turn case insensitive globbing on a per-wildcard basis)

With ksh93:

ls -d -- ~(i:*abc*)

You want globbing to work different, not ls, as those are all files passed to ls by the shell.

Best Answer

Related Solutions

cat command – Display File Names with Contents

How to Match Case Insensitive Patterns with ls

Related Question