Ubuntu – Character encoding problem with filenames – find broken filenames

encodingfilesystemfind

I have the problem described in this Q&A. Probably from quite old linux distros or from windows I have several files with broken filenames. ls displays a "?" instead of the broken character. I successfully renamed some of these files, but I don't know if I've found all of them.

Is there any method to find all affected files?

Best Answer

Assuming you are using utf-8 encoding (the default in Ubuntu), this script should hopefully identify the filenames and rename them for you.

It works by using find with C-encoding (ascii) to locate files with unprintable characters in them. It then tries to determine if these unprintable characters are utf-8 characters or not. If not, it shows you the filenames decoded with each of the encodings listed in the enc array, allowing you to select the one that looks right in order to rename it.

latin1 was commonly used on older linux systems, and windows-1252 is commonly used by windows now adays (I think). iconv -l will show you a list of possible encodings.

#!/bin/bash

# list of encodings to try. (max 10)
enc=( latin1 windows-1252 )

while IFS= read -rd '' file <&3; do
    base=${file##*/} dir=${file%/*}

    # if converting from utf8 to utf8 succeeds, we'll assume the filename is ok.
    iconv -f utf8 <<< "$base" >/dev/null 2>&1 && continue

    # display the filename converted from each enc to utf8
    printf 'In %s:\n' "$dir/"
    for i in "${!enc[@]}"; do
        name=$(iconv -f "${enc[i]}" <<< "$base")
        printf '%2d - %-12s: %s\n' "$i" "${enc[i]}" "$name"
    done
    printf ' s - Skip\n'

    while true; do
        read -p "? " -n1 ans
        printf '\n'
        if [[ $ans = [0-9] && ${enc[ans]} ]]; then
            name=$(iconv -f "${enc[ans]}" <<< "$base")
            mv -iv "$file" "$dir/$name"
            break
        elif [[ $ans = [Ss] ]]; then
            break
        fi
    done
done 3< <(LC_ALL=C find . -depth -name "*[![:print:][:space:]]*" -print0)
Related Question