I just learned that PDF files can be compressed to reduce their disk size.
- I was wondering how to know if a PDF file has already been compressed?
- What applications/commands can be used to compress or uncompress a PDF file?
My environment is Linux Ubuntu 10.10.
Some attempts don't give satisfactory results:
-
Here are the results of trying
pdftk
:$ pdftk 3.pdf output 5.pdf uncompress $ pdftk 3.pdf output 3comp.pdf compress $ ls -l 3.pdf 3comp.pdf 5.pdf -rwxrwx--- 1 root plugdev 8652269 2011-07-30 12:27 3comp.pdf -rwxrwx--- 1 root plugdev 8652319 2011-07-29 22:15 3.pdf -rwxrwx--- 1 root plugdev 16829828 2011-07-30 12:27 5.pdf
Properties of the files show that all of them are not optimized.
-
Results of converting to ps and then back to pdf:
$ pdf2ps 3.pdf 3.ps $ ps2pdf 3.ps 3c.pdf $ ls -l 3.pdf 3.ps 3c.pdf -rwxrwx--- 1 root plugdev 8808946 2011-07-30 13:14 3c.pdf -rwxrwx--- 1 root plugdev 8652319 2011-07-29 22:15 3.pdf -rwxrwx--- 1 root plugdev 122375966 2011-07-30 13:14 3.ps
Best Answer
in short:
To know if it's compressed already:
strings your.pdf | grep /Filter
To (un)compress a PDF, use QPDF
explanation:
The "Filter" keyword inside a pdf file is a indicator of the compression method used. Some of them are:
(copied from here).
However, given the PDF complex file structure, most of the time some part (or "stream") of the PDF will be compressed already in some way (and will show up when grepping /Filter) while some other part will not be, so there is no YES / NO answer to the question whether the PDF is compressed.
one way to overcome this would be to add the
-c
option to grep, which returns the number of occurrences, so you could see relatively how well it is compressed. for example, ifstrings
"large
.pdf" | grep -c /Filter
returns less then 10 it's pretty non-compressed.Another property relating to size in PDFs, is whether they have been optimized for quick access, with "optimized" PDFs being bigger in size, to quote from wikipedia:
You can check whether the PDF is optimized using
pdfinfo your.pdf
.