What causes PDF file size to increase when saving in Preview

pdfpreview

It seems that making edits, annotations, or even just opening and saving a PDF file in Preview will cause a significant increase in file size. I've noticed that for some books I've scanned this also happens to improve page rendering time.

Can anyone shed some light as to what is going on to cause these changes? I am interested in synching annotations of PDF ebooks between Preview and the iPad (maybe GoodReader) but this may be too impractical with large PDF files.

Best Answer

In his little-known novel, PDF Karenina, Leo Tolstoy wrote,

Optimally encoded PDF files are all alike; every sub-optimally encoded PDF file is sub-optimally encoded in its own way.

It's hard for anyone to answer why your PDF files are larger after Preview modifies them. A PDF file consists of many different kinds of data: images, content streams, fonts, document overhead, color spaces, extended graphics states, and a cross reference table. Just like one sentence might be concise and another verbose, but both are valid English and say the same thing, so too one PDF file might have a more verbose way of representing the same content as a more concise PDF file. We'd have to look at your exact PDF files. It's likely that they were created by a variety of different pieces of software, some consise, some less so.

It also matters what version of Mac OS X and Preview you are using, because that determine the software that writes the new PDF file when you do a Save As in Preview.

I can, however, tell you what gets larger about some of my PDF files. This story applies to my computer, running Mac OS X 10.5.8 and Apple Preview 4.2 (469.5).

One file, Giulio.pdf, is a 22-page document with text as text, not scanned images. It is 461,092 bytes large. I opened it in Preview, did File... Save As..., and saved it under a new file name. The new file is 724,421 bytes, or 57% larger.

I opened each file with Adobe Acrobat Professional, version 8.3.1 for Mac OS. I did Advanced... PDF Optimizer... Audit Space Usage.... A small dialog box gave a break-down of how many bytes were due to each category of usage, plus the percent of the total file size for the category.

The original Giulio.pdf has 390,754 bytes (84.75%) devoted to content streams, and zero bytes devoted to images. It is in the PDF 1.4 format. The file saved by Preview has 675,846 bytes (93.29%) devoted to content streams, also zero bytes of images, and is in the PDF 1.3 format. Preview made the content streams 285,092 bytes larger, and that represents 73% of the file size difference between the two.

I wondered if the PDF 1.3 file format was inherently less efficient for storing this kind of file. I opened the original Giulio.pdf in Adobe Acrobat Professional 8, and did Advanced... PDF Optimizer... Make compatible with: Acrobat 3.0 and later and pressed OK. I saved the resulting file under a new name. The resulting file is in the PDF 1.3 format, and was 452,356 bytes, or smaller than the original. Its content streams are 375,171 bytes (82.94%), a similar proportion, but smaller than the content streams of original file.

Thus it seems we can conclude that the Preview app on Mac OS X 10.5.8 is not as efficient as some other PDF creators at making concise content streams in PDF files, and the difference is enough to account for three-quarters of the size difference in a PDF file without images.

I did a similar experiment on form k.pdf, a 1-page document scanned from paper. The original file is 303,730 bytes, of which 298,197 bytes (98.18%) are images. A copy of this file created by Preview using Save As... is 300,601 bytes, or 1% smaller. This file size difference is more than accounted for by a smaller "document overhead" category of bytes in the file created by Preview.

Thus it seems we also can conclude that Preview doesn't always cause a PDF file to increase in size. It depends on the nature of the original PDF file, and how concise it was to start with.