Faster way to rename duplicate files (identified by fdupes) in another directory

duplicatefdupesfilesrenamescripting

I have a directory full of pdf files of journal articles, most of which are named by their bibtex key. Some time ago I made a backup on an external hard drive, but I haven't kept it up to date and there are tons of duplicates with different names. I want to get the two directories back into sync and delete the extra files.

Using fdupes I have identified a bunch of these, and now I have a nice paired list of them. However, most of the duplicates on the external drive have meaningless names. I'd like to rename them to be the same as the duplicate in the first directory, rather than deleting them and copying them over again, because there are so many of them. So I don't want to just use rsync.

For example, if the fdupes output is:

/home/articles/bibtex.pdf
/external/articles/morearticles44.pdf

Is there a faster way than writing

mv /external/articles/morearticles44.pdf /external/articles/bibtex.pdf

for each pair of duplicates?

Best Answer

In my experience fdupes can be inconsistent in the order that it outputs files (I have had my own problems using the --delete option). This should be fairly robust as it doesn't require the files to be in a specific order (as as long as there are always two dupes in different folders):

# note no trailing slash
source_dir=/home/articles
target_dir=/external/articles

fdupes "$target_dir" "$source_dir" |
  while IFS= read file; do
    case "$file" in
      "$source_dir/"*)
         source=${file##*/}
         ;;
      "$target_dir/"*)
         target=$file
         ;;
      '')
         if [ "$source" ] && [ "$target" ]; then
           echo mv -i "$target" "$target_dir/$source"
         fi
         unset source target
         ;;
    esac
  done

This will just print out the mv commands, remove the echo when you are sure you have what you want. Also the -i option for mv will prompt you if it is going to overwrite anything.

Related Question