This phenomenon has been leaving me questions to ask.
Here is the detailed experiment, my OS is Windows 7 x64 SP1:
- I changed a picture (JPG) file to TXT by simply changing its extension (or one could just choose to open the JPG with notepad, same thing)
It should look like this, oddly looking sequences of texts, and some of them (very rare) are actually meaningful, like in the screenshot below "creator: dg-jpeg v1.0…"
- I disabled wrapping and selected all the text using Ctrl+A (to make sure nothing's missed)
- I pasted the copied text to another blank TXT file and saved it as JPG, I compared the new file size with the original JPG. All of them (the original JPG, the converted TXT file and the newly created TXT file) are of the exact same size, to bytes.
When I tried to open, Windows would say "Windows Photo Viewer can't open this picture because the file appears to be damaged, corrupted, or is too large".
I even tried to test it using another method: Opened the JPG with notepad, I cut ONE known character from a location easy to remember (like the first character of the 2nd line) then save the file. The viewer would of course display the same message. Then I opened it again and pasted the character to the EXACT location (Notepad remembers its exit state like windows position, wrapping, fonts size…so I have no problem getting this right)
And still the same error. You can try this to get the idea, remember to choose a small picture else Notepad will act like a old rusty man.
What could have been the cause of this phenomenon?
Depending on the encoding used to open the file you might see different behaviour. My Windows 7 notepad allows to open a file in ANSI, UTF-8, Unicode or Unicode big endian.
I've tested this issue with a small 2x2 pixel jpeg image created with gimp and opening and saving the image file with ANSI encoding. Opening both the original and the saved image with an hex editor I see that all 00 sequences (two hex digits, NUL control character) have been converted to 20 (space character).
Replacing back in the hex editor all 20 by 00 restores the image format.
I've googled it a bit and I didn't found any references that explain why it does that. Only a reference to a post that warns about it (google cache link, the page is not available).
If you save/open the file as UTF-8 it seems that it still converts NUL characters to spaces but it also increases the resulting file size due to conversions from single-byte characters to UTF-8 multi-byte sequences.
If you save/open the file as Unicode it seems that it still converts NUL characters to spaces but also adds a byte to the beginning of the file, the BOM.