Word – Why are PDFs generated from MS Word so large

docxmicrosoft wordpdf

I created a simple MS Word document containing just this sentence:

This is a small document.

Nothing else. Then I've saved this document as DOCX and a PDF. Here are the file sizes:

DOCX: 12 kB
PDF: 89 kB

This difference is huge, technically, and it really starts bothering me when mostly textual documents that are tens of kB in DOCX start generating PDFs that are hundreds of kB large. What's so inefficient about the PDF format? Or is just Word using some terrible output algorithm?

BTW, the PDF output settings were set to create the smallest file possible:

PDF output options

Best Answer

If you open the PDF in notepad++ you'll find:

9 0 obj
<</Filter/FlateDecode/Length 79100/Length1 171804>>
stream
xœì}    XTGºvÕ9½/t7Ðl
..... many more bytes  ...   ëH|  
endstream
endobj
10 0 obj

and that object is referenced here at the end in the /FontFile2 instruction:

6 0 obj
<</Type/FontDescriptor/FontName/ABCDEE+Calibri/Flags 32/ItalicAngle 0/Ascent 750/Descent -250/CapHeight 750/AvgWidth 521/MaxWidth 1743/FontWeight 400/XHeight 250/StemV 52/FontBBox[ -503 -250 1240 750] /FontFile2 9 0 R>>
endobj

The Fonts used by the Word document gets embedded into the PDF so the pdf is self-contained.

I used this slide-deck to decypher the PDF instructions.

If you want to prevent the fonts being embedded in the PDF file make sure your Word document makes use of one of the 14 standard typefaces available in PDF viewers, (source Wikipedia)

  • Times New Roman > Times (v3) (in regular, italic, bold, and bold italic)
  • Courier New > Courier (in regular, oblique, bold and bold oblique)
  • Arial > Helvetica (v3) (in regular, oblique, bold and bold oblique)
  • Symbol > Symbol
  • Wingdings > Zapf Dingbats