Shell Script – How to Replace Spaces in All File Names with Underscore in Linux

filesfindmvrenameshell-script

I tried following shell script which should replace spaces from all xml filenames

for xml_file in $(find $1 -name "* .xml" -type f);
do
 echo "removing spaces from XML file:" $xml_file
 mv "$xml_file" "${xml_file// /_}";
done

Suppose, I have xml file with the name xy z.xml, then it gives:

removing spaces from XML file: /home/krishna/test/xy
mv: cannot stat `/home/krishna/test/xy': No such file or directory
removing spaces from XML file: .xml
mv: cannot stat `z.xml': No such file or directory

Best Answer

Use this with bash:

find $1 -name "* *.xml" -type f -print0 | \
  while read -d $'\0' f; do mv -v "$f" "${f// /_}"; done

find will search for files with a space in the name. The filenames will be printed with a nullbyte (-print0) as delimiter to also cope with special filenames. Then the read builtin reads the filenames delimited by the nullbyte and finally mv replaces the spaces with an underscore.

EDIT: If you want to remove the spaces in the directories too, it's a bit more complicated. The directories are renamed and then not anymore accessible by the name find finds. Try this:

find -name "* *" -print0 | sort -rz | \
  while read -d $'\0' f; do mv -v "$f" "$(dirname "$f")/$(basename "${f// /_}")"; done

The sort -rz reverses the file order, so that the deepest files in a folder are the first to move and the folder itself will be the last one. So, there are never folders renamed before all files and folder are rename inside of it. The mv command in the loop is a bit changed too. In the target name, we only remove the spaces in the basename of the file, else it wouldn't be accessible.

Related Solutions

Shell Metacharacters – How to Escape Shell Metacharacters Automatically with Find Command

It's so much easier with zsh globs here:

for f (**/*.xml(.)) (mv -v -- $f **/$f:r:t(/[1]))

Or if you want to include hidden xml files and look inside hidden directories like find would:

for f (**/*.xml(.D)) (mv -v -- $f **/$f:r:t(D/[1]))

But beware that files called .xml, ..xml or ...xml would become a problem, so you may want to exclude them:

setopt extendedglob
for f (**/(^(|.|..)).xml(.D)) (mv -v -- $f **/$f:r:t(D/[1]))

With GNU tools, another approach to avoid having to scan the whole directory tree for each file would be to scan it once and look for all directories and xml files, record where they are and do the moving in the end:

(export LC_ALL=C
find . -mindepth 1 -name '*.xml' ! -name .xml ! \
  -name ..xml ! -name ...xml -type f -printf 'F/%P\0' -o \
  -type d -printf 'D/%P\0' | awk -v RS='\0' -F / '
  {
    if ($1 == "F") {
      root = $NF
      sub(/\.xml$/, "", root)
      F[root] = substr($0, 3)
    } else D[$NF] = substr($0, 3)
  }
  END {
    for (f in F)
      if (f in D) 
        printf "%s\0%s\0", F[f], D[f]
  }' | xargs -r0n2 mv -v --
)

Your approach has a number of problems if you want to allow any arbitrary file name:

embedding {} in the shell code is always wrong. What if there's a file called $(rm -rf "$HOME").xml for instance? The correct way is to pass those {} as argument to the in-line shell script (-exec sh -c 'use as "$1"...' sh {} \;).
With GNU find (implied here as you're using -quit), *.xml would only match files consisting of a sequence of valid characters followed by .xml, so that excludes file names that contain invalid characters in the current locale (for instance file names in the wrong charset). The fix for that is to fix the locale to C where every byte is a valid character (that means error messages will be displayed in English though).
If any of those xml files are of type directory or symlink, that would cause problems (affect the scanning of directories, or break symlinks when moved). You may want to add a -type f to only move regular files.
Command substitution ($(...)) strips all trailing newline characters. That would cause problems with a file called foo␤.xml for instance. Working around that is possible but a pain: base=$(basename "$1" .xml; echo .); base=${base%??}. You can at least replace basename with the ${var#pattern} operators. And avoid command substitution if possible.
your problem with file names containing wildcard characters (?, [, * and backslash; they are not special to the shell, but to the pattern matching (fnmatch()) done by find which happens to be very similar to shell pattern matching). You'd need to escape them with a backslash.
the problem with .xml, ..xml, ...xml mentioned above.

So, if we address all of the above, we end up with something like:

LC_ALL=C find . -type f -name '*.xml' ! -name .xml ! -name ..xml \
  ! -name ...xml -exec sh -c '
  for file do
    base=${file##*/}
    base=${base%.xml}
    escaped_base=$(printf "%s\n" "$base" |
      sed "s/[[*?\\\\]/\\\\&/g"; echo .)
    escaped_base=${escaped_base%??}
    find . -name "$escaped_base" -type d -exec mv -v "$file" {\} \; -quit
  done' sh {} +

Phew...

Now, it's not all. With -exec ... {} +, we run as few sh as possible. If we're lucky, we'll run only one, but if not, after the first sh invocation, we'll have moved a number of xml files around, and then find will continue looking for more, and may very well find the files we have moved in the first round again (and most probably try to move them where they are).

Other than that, it's basically the same approach as the zsh ones. A few other notable differences:

with the zsh one, the file list is sorted (by directory name and file name), so the destination directory is more or less consistent and predictable. With find, it's based on the raw order of files in directories.
with zsh, you'll get an error message if no matching directory to move the file to is found, not with the find approach above.
With find, you'll get error messages if some directories cannot be traversed, not with the zsh one.

A last note of warning. If the reason you get some files with dodgy file names is because the directory tree is writable by an adversary, then beware than none of the solutions above are safe if the adversary may rename files under the feet of that command.

For instance, if you're using LXDE, the attacker could make a malicious foo/lxde-rc.xml, create a lxde-rc folder, detect when you're running your command and replace that lxde-rc with a symlink to your ~/.config/openbox/ during the race window (which can be made as large as necessary in many ways) between find finding that lxde-rc and mv doing the rename("foo/lxde-rc.xml", "lxde-rc/lxde-rc.xml") (foo could also be changed to that symlink making you move your lxde-rc.xml elsewhere).

Working around that is probably impossible using standard or even GNU utilities, you'd need to write it in a proper programming language, doing some safe directory traversal and using renameat() system calls.

All the solutions above will also fail if the directory tree is deep enough that the limit on the length of the paths given to the rename() system call done by mv is reached (causing rename() to fail with ENAMETOOLONG). A solution using renameat() would also work around the problem.

Best Answer

Related Solutions

Shell Metacharacters – How to Escape Shell Metacharacters Automatically with Find Command

Related Question