I guess you see this �
invalid character because the name contains a byte sequence that isn't valid UTF-8. File names on typical unix filesystems (including yours) are byte strings, and it's up to applications to decide on what encoding to use. Nowadays, there is a trend to use UTF-8, but it's not universal, especially in locales that could never live with plain ASCII and have been using other encodings since before UTF-8 even existed.
Try LC_CTYPE=en_US.iso88591 ls
to see if the file name makes sense in ISO-8859-1 (latin-1). If it doesn't, try other locales. Note that only the LC_CTYPE
locale setting matters here.
In a UTF-8 locale, the following command will show you all files whose name is not valid UTF-8:
grep-invalid-utf8 () {
perl -l -ne '/^([\000-\177]|[\300-\337][\200-\277]|[\340-\357][\200-\277]{2}|[\360-\367][\200-\277]{3}|[\370-\373][\200-\277]{4}|[\374-\375][\200-\277]{5})*$/ or print'
}
find | grep-invalid-utf8
You can check if they make more sense in another locale with recode or iconv:
find | grep-invalid-utf8 | recode latin1..utf8
find | grep-invalid-utf8 | iconv -f latin1 -t utf8
Once you've determined that a bunch of file names are in a certain encoding (e.g. latin1), one way to rename them is
find | grep-invalid-utf8 |
rename 'BEGIN {binmode STDIN, ":encoding(latin1)"; use Encode;}
$_=encode("utf8", $_)'
This uses the perl rename command available on Debian and Ubuntu. You can pass it -n
to show what it would be doing without actually renaming the files.
Since you're going to rename directories under find
's nose, tell it to act on the content of a directory before the directory itself, with -depth
. On the other hand, doing directories separately from regular files doesn't help.
To rename a file with the tools that are available on a default CentOS installation, you can use a shell and mv
. Take care to change only the base name, not the directory name (since the new directory doesn't exist yet).
find . -depth -exec bash -c '
for filename do
basename=${filename##*/}
mv "$filename" "${filename%/*}/${basename// /-}"
done
' _ {} +
Best Answer
Why your code doesn't work
The wildcard pattern
*.avi
is expanded by the shell that runsfind
before runningfind
, so its effect depends on whether there are*.avi
files in the current directory or not. See find not recursive when file at top for more explanations. To expand*.avi
in subdirectories, you'd need to do three things differently: quote the pattern so that the original shell doesn't expand it; arrange to run an additional shell in each subdirectory to perform the wildcard expansion; and look for directories only with thefind
command rather than any file type.In addition, your code ends up calling
rename
on every file at any level under the current directory, including on subdirectories themselves, via{} +
. Sorename
operates on directories, not just regular files.Furthermore there's a syntax error in your Perl code.
Working solution with zsh
^/
is to select any type of file other than directory. Replace with.
for regular files only.-n
is for dry-run. Remove when happy.Working solution with
find
andrename
With the perl-based variants of
rename
and afind
implementation that supports-execdir
:There are a few caveats with that approach though:
rename
instance per directory containing files to rename (onerename
per file with somefind
implementations/versions where-execdir ... {} +
is actually the same as-execdir ... {} \;
. (zmv
runs onemv
per file, but you can makemv
builtin withzmodload zsh/files
to speed it up).-execdir
,find
runs the command in the directory that contains those files and passes a path relative to that directory to the command. Somefind
implementations (the GNU one) add a./
prefix to the files, some don't. Some variants ofrename
do accept options after the perl expression, which means that if you have a file whose name starts with-
, it could cause problem.LC_ALL=C
for-name
to work even if file names contain sequences of bytes that otherwise wouldn't form valid characters in the locale.rename
inherits that and anyway in most variants only works with ASCII. That means however that it will replace multi-byte characters with as many_
as the character has bytes. For instance, it would rename a UTF-8stéphane
tost__phane
instead ofst_phane
.zsh
is OK because it will convert both multi-byte characters and all bytes that can't be decoded to characters into one_
character each.zsh
'szmv
, it won't perform sanity checks (like that 2 files are not going to end up having the same name likea+b.avi
anda@b.avi
) prior to start renaming.rename
should however not overwrite existing files.