FSlint
is a versatile duplicate finder that includes a function for finding duplicate names:

The FSlint package for Ubuntu emphasizes the graphical interface, but as is explained in the FSlint FAQ a command-line interface is available via the programs in /usr/share/fslint/fslint/
. Use the --help
option for documentation, e.g.:
$ /usr/share/fslint/fslint/fslint --help
File system lint.
A collection of utilities to find lint on a filesystem.
To get more info on each utility run 'util --help'.
findup -- find DUPlicate files
findnl -- find Name Lint (problems with filenames)
findu8 -- find filenames with invalid utf8 encoding
findbl -- find Bad Links (various problems with symlinks)
findsn -- find Same Name (problems with clashing names)
finded -- find Empty Directories
findid -- find files with dead user IDs
findns -- find Non Stripped executables
findrs -- find Redundant Whitespace in files
findtf -- find Temporary Files
findul -- find possibly Unused Libraries
zipdir -- Reclaim wasted space in ext2 directory entries
$ /usr/share/fslint/fslint/findsn --help
find (files) with duplicate or conflicting names.
Usage: findsn [-A -c -C] [[-r] [-f] paths(s) ...]
If no arguments are supplied the $PATH is searched for any redundant
or conflicting files.
-A reports all aliases (soft and hard links) to files.
If no path(s) specified then the $PATH is searched.
If only path(s) specified then they are checked for duplicate named
files. You can qualify this with -C to ignore case in this search.
Qualifying with -c is more restictive as only files (or directories)
in the same directory whose names differ only in case are reported.
I.E. -c will flag files & directories that will conflict if transfered
to a case insensitive file system. Note if -c or -C specified and
no path(s) specifed the current directory is assumed.
Example usage:
$ /usr/share/fslint/fslint/findsn /usr/share/icons/ > icons-with-duplicate-names.txt
$ head icons-with-duplicate-names.txt
-rw-r--r-- 1 root root 683 2011-04-15 10:31 Humanity-Dark/AUTHORS
-rw-r--r-- 1 root root 683 2011-04-15 10:31 Humanity/AUTHORS
-rw-r--r-- 1 root root 17992 2011-04-15 10:31 Humanity-Dark/COPYING
-rw-r--r-- 1 root root 17992 2011-04-15 10:31 Humanity/COPYING
-rw-r--r-- 1 root root 4776 2011-03-29 08:57 Faenza/apps/16/DC++.xpm
-rw-r--r-- 1 root root 3816 2011-03-29 08:57 Faenza/apps/22/DC++.xpm
-rw-r--r-- 1 root root 4008 2011-03-29 08:57 Faenza/apps/24/DC++.xpm
-rw-r--r-- 1 root root 4456 2011-03-29 08:57 Faenza/apps/32/DC++.xpm
-rw-r--r-- 1 root root 7336 2011-03-29 08:57 Faenza/apps/48/DC++.xpm
-rw-r--r-- 1 root root 918 2011-03-29 09:03 Faenza/apps/16/Thunar.png
If the files are all in the same dir, you can:
ls |
awk -F_ '{ i=$1; m=$2; s=$3; f[i"_"s] = f[i"_"s] " " $0 }
END{ for(insc in f)
printf "paste%s >out_%s.txt\n",f[insc],insc
}'
which splits the filename on "_" (-F_
), sets the variables i,m,s
to the first 3 parts of the filename (institute,model,scenario),
and accumulates in array f the filename. The array is indexed
only by the institute and scenario, so all the models are concatenated
(m isn't used). The final END prints the f array, and uses the index (institute_scenario) as the
name for the output file. With your examples this produces
paste wbm_gfdl_rcp8p5_mississippi.txt wbm_hadgem_rcp8p5_mississippi.txt >out_wbm_rcp8p5.txt
paste matsiro_hadgem_rcp4p5_mississippi.txt matsiro_ipsl_rcp4p5_mississippi.txt >out_matsiro_rcp4p5.txt
paste matsiro_gfdl_rcp8p5_mississippi.txt matsiro_miroc_rcp8p5_mississippi.txt >out_matsiro_rcp8p5.txt
You then need to pipe this into the shell to have it executed. Add | sh
to the last line above to do this.
To remove some columns from the input files, you need to alter the awk line
that is collecting all the input filenames. In the 1st awk line:
{ i=$1; m=$2; s=$3; f[i"_"s] = f[i"_"s] " " $0 }
the filename is the "$0". For example, if you change this line into:
{ i=$1; m=$2; s=$3; f[i"_"s] = f[i"_"s] sprintf(" <(cut -f4 %s)",$0) }
then you will get the example output:
paste <(cut -f4 wbm_gfdl_rcp8p5_mississippi.txt) <(cut -f4 wbm_hadgem_rcp8p5_mississippi.txt) >out_wbm_rcp8p5.txt
but if you want to cut only the 2nd filename, it is a bit more complicated and
you need this instead:
{ i=$1; m=$2; s=$3;
if(f[i"_"s]=="")add = $0; else add = sprintf("<(cut -f4 %s)",$0);
f[i"_"s] = f[i"_"s] " " add }
so you will get
paste wbm_gfdl_rcp8p5_mississippi.txt <(cut -f4 wbm_hadgem_rcp8p5_mississippi.txt) >out_wbm_rcp8p5.txt
If sh
does not understand the syntax <(cut ...)
then replace it by bash
.
Best Answer
There are at least a hundred thousand million different ways of approaching this but here are the top contenders:
The Bash for loop
Using
find
The
find
command has a lovely littleexec
command that's great for running things (with some caveats). Find is better than basic globbing because you can really filter down on the files you're selecting. Be careful of the odd syntax.Note that
find
is recursive so you might want to use-maxdepth 1
to keepfind
in current working directory.-type f
can be used to filter out regular files.If we're just renaming doc to txt...
The
rename
command is sed-like in searching. Obviously this won't do anything to convert the format.