Remove duplicates by renaming identical fles to same name

command linefilenamesrename

I'm working with graphic design. I've downloaded many files (EPS files, PSD files, etc) from various websites.

Because it come from various websites, after downloaded from more than 10 different websites, I got many same files with same size, and same everything but different file name (2 to 4 copies for same file). To remove the duplication by manually open one by one is very time consuming

I hope there is a way to rename all downloaded files to be unique name for different files (I don't mind if the new name is not descriptive).

For example, 2 same file: file nice-sun.eps downloaded from site 1, while 678.eps downloaded from site 2. It will become same file name after renamed.

Best Answer

This command will rename all files to the md5sum of their content. That means files with the same content will get the same name.

for f in *; do mv $f $(md5sum $f | cut -d " " -f 1); done

You can replace md5sum with sha1sum in the command.

For this demonstration I added -v to mv so we can see what is being renamed.

$ echo 1 > a
$ echo 2 > b
$ echo 1 > c
$ ls -1
a
b
c
$ for f in *; do mv -v $f $(md5sum $f | cut -d " " -f 1); done
`a' -> `b026324c6904b2a9cb4b88d6d61c81d1'
`b' -> `26ab0db90d72e28ad0ba1e22ee510510'
`c' -> `b026324c6904b2a9cb4b88d6d61c81d1'
$ ls -1
26ab0db90d72e28ad0ba1e22ee510510
b026324c6904b2a9cb4b88d6d61c81d1

You can also safely run this command in a directory where some files have unified filename while other have not.

$ echo 1 > d
$ echo 2 > e
$ ls -1
26ab0db90d72e28ad0ba1e22ee510510
b026324c6904b2a9cb4b88d6d61c81d1
d
e
$ for f in *; do mv -v $f $(md5sum $f | cut -d " " -f 1); done
mv: `26ab0db90d72e28ad0ba1e22ee510510' and `26ab0db90d72e28ad0ba1e22ee510510' are the same file
mv: `b026324c6904b2a9cb4b88d6d61c81d1' and `b026324c6904b2a9cb4b88d6d61c81d1' are the same file
`d' -> `b026324c6904b2a9cb4b88d6d61c81d1'
`e' -> `26ab0db90d72e28ad0ba1e22ee510510'
$ ls -1
26ab0db90d72e28ad0ba1e22ee510510
b026324c6904b2a9cb4b88d6d61c81d1

Note that it will still calculate the hash of the files that are already hashed. So if the files are huge you might want to prevent the rehashing.

Related Question