Windows – My notepad txt file turned into weird characters

decodingnotepadtext-editorswindows

Some years ago I wrote some notes in a .txt file (plain text) on Notepad and when I opened it recently it appeared with these weird characters. I don't know at what point it turned out like that, but it could have been in-between a transition from a Windows 7 -> External Drive -> Windows 10 (current).

Another .txt files that were in the same folder as this one got like that, although the majority hadn't changed. This makes me suspect of either a conversion error between Microsoft OS or the files got corrupted.

Also, when I opened with Notepad++, this same file was written like this . When I copied it to google translate textbox, the characters with black background turned into some kind of coded matrix, so maybe it can give some kind of clue.

I already tried to decode through many ways without being successful. Maybe someone has any idea if this is a matter that can be solved through decoding to Plain Text (ASCII), or if the files are corrupted and there is no way back.

Thanks.

Best Answer

Some years ago I wrote some notes in a .txt file (plain text) on Notepad and when I opened it recently it appeared with these weird characters. I don't know at what point it turned out like that, but it could have been in-between a transition from a Windows 7 -> External Drive -> Windows 10 (current).

Another .txt files that were in the same folder as this one got like that, although the majority hadn't changed. This makes me suspect of either a conversion error between Microsoft OS or the files got corrupted.

The files got corrupted. It might be a hardware problem or an OS problem, although it's much more likely that they got corrupted when copying from/to the external drive (e.g. via bad USB connection or the drive was damaged) and not during an OS upgrade.

when I opened with Notepad++, this same file was written like this . When I copied it to google translate textbox, the characters with black background turned into some kind of coded matrix, so maybe it can give some kind of clue.

These are "control characters" – they're meant to be interpreted by programs and not shown on screen, and normally they wouldn't occur in a text file at all (except for CR/LF/TAB of course). Therefore they don't have a standard visual representation, and different programs have different ways of displaying them if they do occur:

  • Notepad++ (well, its Scintilla core) shows each character's name from the ASCII standard, e.g. byte 0x03 is "EOT" (End-of-Transmission) and 0x18 is "CAN" (Cancel). Some of these names date back to the telegraph era.

  • Your browser uses the same method to show all unprintable characters – the 'matrix' is just a four-digit number indicating that character's Unicode codepoint. (In this case they're the U+0018 aka CAN, U+0003 aka EOT, and so on.)

    You can see the same box-with-digits for any character that the OS doesn't have in its fonts, e.g. it will show up for newly released emojis that the OS/browser doesn't yet support.

I already tried to decode through many ways without being successful. Maybe someone has any idea if this is a matter that can be solved through decoding to Plain Text (ASCII), or if the files are corrupted and there is no way back.

In Notepad++, the file looks kind of like it's half-UTF-8 and half-garbage (the accented-'A's tend to show up when an UTF-8 file is misinterpreted as Windows-1252).

However, in this case it's probably just a coincidence and there's likely nothing decodable in this file anymore.

Related Question