I guess you see this �
invalid character because the name contains a byte sequence that isn't valid UTF-8. File names on typical unix filesystems (including yours) are byte strings, and it's up to applications to decide on what encoding to use. Nowadays, there is a trend to use UTF-8, but it's not universal, especially in locales that could never live with plain ASCII and have been using other encodings since before UTF-8 even existed.
Try LC_CTYPE=en_US.iso88591 ls
to see if the file name makes sense in ISO-8859-1 (latin-1). If it doesn't, try other locales. Note that only the LC_CTYPE
locale setting matters here.
In a UTF-8 locale, the following command will show you all files whose name is not valid UTF-8:
grep-invalid-utf8 () {
perl -l -ne '/^([\000-\177]|[\300-\337][\200-\277]|[\340-\357][\200-\277]{2}|[\360-\367][\200-\277]{3}|[\370-\373][\200-\277]{4}|[\374-\375][\200-\277]{5})*$/ or print'
}
find | grep-invalid-utf8
You can check if they make more sense in another locale with recode or iconv:
find | grep-invalid-utf8 | recode latin1..utf8
find | grep-invalid-utf8 | iconv -f latin1 -t utf8
Once you've determined that a bunch of file names are in a certain encoding (e.g. latin1), one way to rename them is
find | grep-invalid-utf8 |
rename 'BEGIN {binmode STDIN, ":encoding(latin1)"; use Encode;}
$_=encode("utf8", $_)'
This uses the perl rename command available on Debian and Ubuntu. You can pass it -n
to show what it would be doing without actually renaming the files.
You need to figure out the file encoding of the partitions in question, and set that encoding in fstab for the partition. I am assuming that the partitions in question are NTFS or FAT from a Windows setup. What language was Windows in? German?
Best Answer
I don't have any Cyrillic characters in my music collection but I can do Greek with no problem using the latest version of
eyed3
installed bysudo pip install --upgrade eyed3
:In the example above, I have a directory (album name) called
Μπεστ οφ
which contains a song calledΚάγκελα Παντού
byΤζίμης Πανούσης
. As you can see in theid3tool
output above, the tags are not in Greek. Let's fix that:That correctly set the tags using the Greek alphabet:
OK, but since the information is encoded in the name of the file, this can be automated. In the example above, the file name has this format:
So, we can parse and add the tags for all files with a little shell magic:
After running this command, all files will have had their id3 tags modified: