Shell – How to prepare files for rsync on a case insensitive filesystem

case sensitivityfilenamesrsyncshell-scriptzsh

I am transferring a large number of files on a HFS+ filesystem.

The files are currently on ext2 partitions.

I have conflicts due to case insensitivity of the target partition (HFS+).

I want to identify the files that have duplicates filenames once they are in lower case, and delete them if they are actually duplicates.

I also found that I will have duplicate folder names if I convert everyhing to lower case. Basically these hard drives contain years of unsorted data, and I happen to have this problem with folder names too.

Does this seem reasonable:

find . -type f | while read f; do echo $f:l; done | sort | uniq -d 

$f:l is ZSH for convert to lower case.

Now I want to keep only one instance of each file that have duplicates.
How to do this efficiently ?

I do not want to find files with duplicate content, unless they have the same lower case filename. I will deal with duplicates later.

Best Answer

The second step in your pipeline is slightly broken (it mangles backslashes and leading and trailing whitespace) and is a complicated way of doing this. Use tr to convert to lowercase. You shouldn't limit the search to files: directories can collide too.

find . | tr '[:upper:]' '[:lower:]' | LC_ALL=C sort | LC_ALL=C uniq -d

Note that this only works if file names don't contain newlines. Under Linux, switch to null bytes as the separator to cope with newlines.

find . -print0 | tr '[:upper:]' '[:lower:]' | LC_ALL=C sort -z | LC_ALL=C uniq -dz

This prints the lowercase versions of file names, which isn't really conducive to doing something about the files.

If you're using zsh, forget about find: zsh has everything you need built in.

setopt extended_glob
for x in **/*; do
  conflicts=($x:h/(#i)$x:t)
  if (($#conflicts > 1)); then
    ## Are all the files identical regular files?
    h=()
    for c in $conflicts; do 
      if [[ -f $c ]]; then
        h+=(${$(md5sum <$c)%% *})
      else
        h=(not regular)
        break
      fi
    done
    if (( ${#${(@u)h}} == 1 )); then
      # Identical regular files, keep only one
      rm -- ${conflicts[1,-2]}
    else
      echo >&2 "Conflicting files:"
      printf >&2 '    %s\n' $conflicts
    fi
  fi
done
Related Question