MacOS – Disable automatic “ligature” handling in PDF/Preview on El Capitan

macospdfpreview

I may have found a serious bug in EC…

As a college student I frequently need to copy and paste short passages out of PDF documents into other documents (e.g. for quoting). Previously, on Yosemite, I simply used Preview to do this. Preview is awesome because its AI algorithms almost always properly determine where paragraph breaks are. Just about every other PDF viewer I've used simply puts hard returns at every visual line break in the PDF.

On EC however, certain common "ligature" characters (i.e. two or more characters rendered together, such as "Th", "ffi", "ff", "fi", "ft" – these are the ones I've found so far) end up copying as blank spaces. In other words, if I have a PDF containing text that reads:

"This is different from those who can afford to pay for college first; such students may find it less difficult by at least fifty percent."

I'll get this output when I copy and paste this text:

" is is di erent from those who can a ord to pay for college  rst; such students may  nd it less di cult by at least   y percent."

This is clearly not the desired output!

When I slowly select the text character by character I can clearly see that the selection is moving over all of the letters in the ligature at the same time – in other words, it is treating the ligature as if it is one character.

I tested the PDF on a lab machine at school which is still running Yosemite and it did not exhibit this behavior so this is very definitely a bug introduced in EC.

Is there a setting perhaps via defaults that I can use to completely disable this behavior?

(As a side note, I found out this bug after turning in a paper and having a professor ask why my quotes looked so funny… I didn't proofread as carefully as I should have, but still.)

Best Answer

Are you sure it isn't a simple font substitution error? What font is the PDF rendered in? Do you have that font installed? Does the lab machine have it installed? Does the document you are pasting into use the same font as the source PDF?

I did a simple test using your sample text, placed in a Word document (with ligatures enabled), then saved as a PDF using either Apple's in-built PDF or Acrobat Pro.

The original text, cut and pasted from Word:
“This is different from those who can afford to pay for college first; such students may find it less difficult by at least fifty percent.”

Apple PDF, cut and pasted from Preview:
“This is different from those who can afford to pay for college first; such students may find it less difficult by at least fi,y percent.”

Adobe PDF, cut and pasted from Preview:
“This is different from those who can afford to pay for college first; such students may find it less difficult by at least fi�y percent.”

Adobe PDF, cut and pasted from Acrobat Pro:
“This is different from those who can afford to pay for college first; such students may find it less
difficult by at least fi?y percent.”

All of the above attempts pasted correctly, with the exception of the "fty" ligature, which differed in rendering based upon the PDF creation and rendering method(s) used. This character likely didn't translate correctly into plain text because I have enabled more than just basic ligatures in Word (optional and discretionary, but not historical).

All in all, it looks to me to be nothing but a font encoding or substitution error. Remember that a PDF document will have any necessary font characters embedded in the file itself, but when pasting, the target font will not necessarily match the source, especially if you are using a system with a clean install (meaning fewer fonts).