I've created a simple pdf [hi.pdf
] with the word hi
and when I open it in Notepad++, its encoding is ANSI, which I assume is Notepad++'s best guess, with it opening successfully when I Save as hiSaveAs.pdf
.
However, when I copy the contents of hi.pdf
from Notepad++, pasting into a new file and saving as hiANSI.pdf
with an encoding of ANSI, the file is corrupted and can't be opened:
Error, failed to load pdf document.
- When I re-open
hiANSI.pdf
in Notepad++, it has UTF8 listed as the encoding and when I compare it tohi.pdf
, I notice it has whitespaces wherehi.pdf
has theNUL
character: - If I change the encoding of
hiANSI.pdf
to ANSI instead of UTF8, the text differs fromhi.pdf
even more:
Can someone explain what is happening here?
- Why does Save as work, but copying the exact same text into a new Notepad++ file results with a whitespace instead of the
NUL
char? - Why does Notepad++ think
hiANSI.pdf
is UTF8, buthi.pdf
ANSI?
This does not answer this question.
The MSB is not being stripped. Have a look at the hex comparison:
For example, why is 0A being added between 0D and 25 (first row, 10th byte)?
UPDATE:
I noticed Notepad did much less than Notepad++ in terms of "helping". For example when I saved hi.pdf as hiANSI.pdf using Notepad instead of Notepad++, the only thing Notepad did to help was add 0x0A
(line feed) after 0x0D
(carriage return), and replaced 0x00
(NUL) with 0x20
(space):
If I saved hi.pdf as hiANSI.bin, it did even less. It just replaced 0x00
with 0x20
:
In the above two cases, it produced a valid PDF but with "hi" replaced with "IJ":
UPDATE
If I replace the following 0x20 bytes in hiANSI.pdf with 0x00 to match hi.pdf, it displays "hi" instead of "IJ" but with a different font:
Here are the two bytes I changed (highlighted in yellow):
Why does changing these two bytes have this effect?
Best Answer
Notepad++ is a text-editor, not a binary editor, so it "corrected" the text when pasting.
In your example, the
0D
was taken to be carriage-return, which was taken to be part of the end-of-line character in Windows, but still missing the0A
(line-feed). So Notepad++ has thoughtfully corrected your text.For more information see Wikipedia:
For a freeware hex editor, see for example HxD.