Decode weird characters in text file

character encoding

Someone sent me a text file. Although I can read most of the document, sometimes there are unusual characters. When I open it in VIM, I see <92> in it's place. When I use gedit, i see a character that looks like a square with two zeros and 9 and 4 in the square.

Is there a way to decode these funny characters back to their human readable equivalent?

I also ran the following in shell:

johncomputer> file --mime-encoding file.txt
johncomputer> file.txt: : utf-8

SO i think it's utf8 encoded.

Oh and also, this is a text document where most characters are read-able. Just some (not all) of the accented characters are showing up weird.

Best Answer

The odds are that what you see as <92> and <94> are windows-1252 encoded “smart” (curly) apostrophe and “smart” right double quotation mark. They could be just about anything, of course, but in UTF-8, such bytes cannot appear as “standalone”, only as the 2nd or later byte of a multi-byte representation of a character,

Best Answer

Related Solutions

Linux – What could cause the file command in Linux to report a text file as binary data

Windows – (Czech) character set support in gvim 7.3 on Windows 7

Related Question