I have directory that has a number of sub-directories and would like to find any duplicates. The folder structure looks something like this:
└── Top_Dir
└── Level_1_Dir
├── standard_cat
│ ├── files.txt
├── standard_dog
│ └── files.txt
└── standard_snake
└── files.txt
└── Level_2_Dir
├── standard_moon
│ ├── files.txt
├── standard_sun
│ └── files.txt
└── standard_cat
└── files.txt
└── Level_3_Dir
├── standard_man
│ ├── files.txt
├── standard_woman
│ └── files.txt
└── standard_moon
└── files.txt
With the above example I would like to see an output of:
/top_dir/Level_1_Dir/standard_cat
/top_dir/Level_2_Dir/standard_cat
/top_dir/Level_2_Dir/standard_moon
/top_dir/Level_3_Dir/standard_moon
I have been doing some searching on how to get this done via bash and I got nothing. Anyone know a way to do this?
Best Answer
I had the same problem with my music collection... most tools/scripts were noisy (listing filenames) or did checksums of file contents, which is far too slow...
Special characters, spaces, and symbols made this challenging... the strategy is to MD5sum the sorted file names along with the parent directory, then the script can sort hashes to find duplicates. We must sort children file names, as find does not guarantee file order in two different directories.
Bash Script (Debian 10):
Directory structure:
Example output: