I guess you see this �
invalid character because the name contains a byte sequence that isn't valid UTF-8. File names on typical unix filesystems (including yours) are byte strings, and it's up to applications to decide on what encoding to use. Nowadays, there is a trend to use UTF-8, but it's not universal, especially in locales that could never live with plain ASCII and have been using other encodings since before UTF-8 even existed.
Try LC_CTYPE=en_US.iso88591 ls
to see if the file name makes sense in ISO-8859-1 (latin-1). If it doesn't, try other locales. Note that only the LC_CTYPE
locale setting matters here.
In a UTF-8 locale, the following command will show you all files whose name is not valid UTF-8:
grep-invalid-utf8 () {
perl -l -ne '/^([\000-\177]|[\300-\337][\200-\277]|[\340-\357][\200-\277]{2}|[\360-\367][\200-\277]{3}|[\370-\373][\200-\277]{4}|[\374-\375][\200-\277]{5})*$/ or print'
}
find | grep-invalid-utf8
You can check if they make more sense in another locale with recode or iconv:
find | grep-invalid-utf8 | recode latin1..utf8
find | grep-invalid-utf8 | iconv -f latin1 -t utf8
Once you've determined that a bunch of file names are in a certain encoding (e.g. latin1), one way to rename them is
find | grep-invalid-utf8 |
rename 'BEGIN {binmode STDIN, ":encoding(latin1)"; use Encode;}
$_=encode("utf8", $_)'
This uses the perl rename command available on Debian and Ubuntu. You can pass it -n
to show what it would be doing without actually renaming the files.
Actually, for i in *; do something; done
treats every file name correctly, except that file names that begin with a .
are excluded from the wildcard matching. To match all files (except .
and ..
) portably, match * .[!.]* ..?*
and skip any nonexistent file resulting from a non-matching pattern being left intact.
If you experienced problems, it's probably because you didn't quote $i
properly later on. Always put double quotes around variable substitutions and command substitutions: "$foo"
, "$(cmd)"
unless you intend field splitting and globbing to happen.
If you need to pass the file name to an external command (you don't, here), be careful that echo "$foo"
does not always print $foo
literally. A few shells perform backslash expansion, and a few values of $foo
beginning with -
will be treated as an option. The safe and POSIX-compliant way to print a string exactly is
printf '%s' "$foo"
or printf '%s\n' "$foo"
to add a newline at the end. Another thing to watch out for is that command substitution removes trailing newlines; if you need to retain newlines, a possible trick is to append a non-newline character to the data, make sure the transformation retains this character, and finally truncate this character. For example:
mangled_file_name="$(printf '%sa' "$file_name" | tr -sc '[:alnum:]-+_.' '[_*]')"
mangled_file_name="${mangled_file_name%a}"
To extract the md5sum of the file, avoid having the file name in the md5sum
output, since that will make it hard to strip. Pass the data on md5sum
's standard input.
Note that the md5sum
command is not in POSIX. A few unix variants have md5
or nothing at all. cksum
is POSIX but collision-prone.
See Grabbing the extension in a file name on how to get the file's extension.
Let's put it all together (untested). Everything here works under any POSIX shell; you could gain a little, but not much, from bash features.
for old_name in * .[!.]* ..?*; do
if ! [ -e "$old_name" ]; then continue; fi
hash=$(md5sum <"$old_name")
case "$old_name" in
*.*.gz|*.*.bz2) # double extension
ext=".${old_name##*.}"
tmp="${old_name%.*}"
ext=".${old_name##*.}$ext";;
?*.*) ext=".${old_name##*.}";; # simple extension
*) ext=;; # no extension
esac
mv -- "$old_name" "$hash$ext"
done
Note that I did not consider the case where there is already a target file by the specified name. In particular, if you have existing files whose name looks like your adopted convention but where the checksum part doesn't match the file's contents and instead matches that of some other file with the same extension, what happens will depend on the relative lexicographic order of the file names.
Best Answer
The
-n
flag is forSo it's normal if you don't have any changes.
Regarding your command, it's working for me:
Maybe depending on your shell, you have to escape the |
Or you can use the
[…]
notation to group characters: