Iconv illegal input sequence- why

character encodingtext processingunicode

While trying to convert a text file into its ASCII equivalent, I get error message that iconv: illegal input sequence at position.

Command I use is iconv -f UTF-8 -t ascii//TRANSLIT file

The offending character is æ.

Text file itself is present here.

Why does it say illegal sequence? The input character is proper UTF-8 character (U+00E6).

Best Answer

The file is encoded in ISO-8859-1, not in UTF-8:

$ hd 0606461.txt | grep -B1 '^0002c520'
0002c510  64 75 6d 20 66 65 72 69  65 6e 74 20 72 75 69 6e  |dum ferient ruin|
0002c520  e6 0d 0a 2d 2d 48 6f 72  61 63 65 2e 0d 0a 0d 0a  |...--Horace.....|

And the byte "e6" alone is not a valid UTF-8 sequence.

So, use iconv -f latin1 -t ascii//TRANSLIT file.

Related Question