How to Recursively Replace Invalid Characters in Filenames Using Rename

bashfindrenamewildcardszsh

I'm looking to write a script that will find and replace any characters other than [^A-Za-z0-9._-] in a specific file type (in this case .avi) with an underscore _. I want this to exec recursively from the current path. I'm using Ubuntu.

I do not want or need to change any folder names as I have control over those at the time of creation. Here's what I have now after referencing the following link:

SE reference

find . -depth -exec rename 's!([^/]*\Z)/[^A-Za-z0-9._-]/_/g' *.avi {} +

I'm clearly missing something. Please help. 🙂

Best Answer

Why your code doesn't work

The wildcard pattern *.avi is expanded by the shell that runs find before running find, so its effect depends on whether there are *.avi files in the current directory or not. See find not recursive when file at top for more explanations. To expand *.avi in subdirectories, you'd need to do three things differently: quote the pattern so that the original shell doesn't expand it; arrange to run an additional shell in each subdirectory to perform the wildcard expansion; and look for directories only with the find command rather than any file type.

In addition, your code ends up calling rename on every file at any level under the current directory, including on subdirectories themselves, via {} +. So rename operates on directories, not just regular files.

Furthermore there's a syntax error in your Perl code.

Working solution with zsh

autoload -Uz zmv # best in ~/.zshrc

zmv -n '(**/)(*.avi)(#qD^/)' '$1${2//[^a-zA-Z0-9._-]/_}'

^/ is to select any type of file other than directory. Replace with . for regular files only. -n is for dry-run. Remove when happy.

Working solution with find and rename

With the perl-based variants of rename and a find implementation that supports -execdir:

LC_ALL=C find . -depth -name '*[!a-zA-Z0-9._-]*.avi' ! -type d -execdir \
  rename 's/[^a-zA-Z0-9._-]/_/g' {} +

There are a few caveats with that approach though:

  • That runs at least one rename instance per directory containing files to rename (one rename per file with some find implementations/versions where -execdir ... {} + is actually the same as -execdir ... {} \;. (zmv runs one mv per file, but you can make mv builtin with zmodload zsh/files to speed it up).
  • With -execdir, find runs the command in the directory that contains those files and passes a path relative to that directory to the command. Some find implementations (the GNU one) add a ./ prefix to the files, some don't. Some variants of rename do accept options after the perl expression, which means that if you have a file whose name starts with -, it could cause problem.
  • we have to use LC_ALL=C for -name to work even if file names contain sequences of bytes that otherwise wouldn't form valid characters in the locale. rename inherits that and anyway in most variants only works with ASCII. That means however that it will replace multi-byte characters with as many _ as the character has bytes. For instance, it would rename a UTF-8 stéphane to st__phane instead of st_phane. zsh is OK because it will convert both multi-byte characters and all bytes that can't be decoded to characters into one _ character each.
  • contrary to zsh's zmv, it won't perform sanity checks (like that 2 files are not going to end up having the same name like a+b.avi and a@b.avi) prior to start renaming. rename should however not overwrite existing files.
Related Question