Windows – TXT files: how to switch from weird characters back to normal

character encodingencodingfile-corruptiontext-editorswindows 7

So, I have on a flash drive a txt file generated in Cyrillic (my own work, own pen drive), a few years old. Now I needed to open it, only to see this kind of
mess.

I wonder why is this happening and how can I restore it back to normal.. I tried saving it under Unicode and UTF-8 encoding, even some MS-DOS format (an option from Wordpad) but it makes no difference at all.

Best Answer

What you're seeing is referred to as mojibake. In short, the application you are opening the file with is using the wrong encoding to try and read the file. The standard fix is to use a transcoding tool, either online or offline (though I know of no free ones for Windows which work offline), or open the document in an application that lets you set the encoding and save it through that as the desired encoding.

As a somewhat hacky alternative, if you can save the file without modifying the encoding, you can change the extension to .eml, format it like an email message, make sure the Content-Type header specifies the correct encoding, and then open the resulting file in a good email client (pretty much anything except Outlook or Windows Mail) and copy the text out of there to a text editor and save it.

For future reference, the generally accepted method of avoiding this is to save files as either UTF-8 or UTF-16 (UTF-8 is usually preferred, as it's better supported by most platforms other than Windows than UTF-16).

In particular, your file does indeed appear to be encoded using KOI-8 (determined based on the statement that the text is Cyrillic and the apparent distribution of actual characters), with the application apparently interpreting it as ISO-8859-1 or Windows codepage 1252(determined simply based on what is being displayed, plus the fact that these are standard fallback encodings for many devices).

Related Question