MacOS – Preview copies aliens. How to change the encoding/make it work

copy/pastemacospdfpreview

Preview copies aliens! Literally. Here is a screenshot:

Screenshot

The text selected on the pdf was copied (Cmd+C) and pasted (Cmd+V) from Preview to TextEdit. What did I get? A bunch of aliens.

I'm guessing this is an encoding issue. Is there a way to change encoding in Preview or something? Is there a way I can check what encoiding it is using and then choose the appropiate on TextEdit? Here is the text, just so you can see the squares for yourselves (No aliens, apparently that error character only appears on native OSX apps and not browsers).

????????? ????? ???????? ??? ???? ?????? ??????? ??? ???? ?????? ??? ? ?????????????????????????????????????????????????????????????????? ?????????????????????????????????????????????????????

Alien

(I must admit OSX's error character is pretty amusing).

Best Answer

The PDF file has been protected. You can’t copy-paste text from it. Plane 16 is displayed when a character is in the “private” area of the Unicode specification. This has been deliberately done to prevent you from copy-pasting.

Also, this is not OS X’s error character. It just appears when it encounters something in the private area of the Unicode spec.

A PDF file optionally contains a mapping to the Unicode private area for each character in the font. This particular PDF contains such a map, so it will display the correct glyphs only in a PDF reader.

Simple English: The PDF file is full of characters, which are mapped to different places in the Unicode private area. Normally, it would just display Plane 16, but because the PDF file contains a map which translate the private area characters into readable latin glyphs (a, b, R, F, etc), you see words in the PDF but aliens in other places.