Breaking pdf to various images, compress, and recombine

compressionfile conversionimage editingimage processingpdf

I want to save each page of a PDF file as separate image, compress them, and recombine back as PDF.

Some PDF files I often use are strangely large in size.
I mean, some 100 pages but some 200M.
I suspect this is because some PDF files are saved in graphic manner of too high resolution or in a way not properly compressed.
The fact that files having similar condition, in terms of resolution and legibility, have often smaller size, make me wonder there is still room of compression.
(I have no knowledge in image processing, so this is just my feeling.)

My plan is as follows.
I ask of 1 and 3, and it is desirable that I can do all of these on command line, so that I can write a wrapping script myself, which shall be easy matter.
Screenshot does 1, and Preview can do 3, but it is not clear whether they can be done with CLI.

  1. Save each page of the PDF as image.

  2. Filter each image.
    I do not ask for this part, since image processing tools are abundant.
    I can explore appropriate filters myself.
    In terms of legibility, I find it reduces file size if the image is turned black and white.

  3. Recombine these images

If you have totally different ways to compress a PDF, you are also welcomed.
Perhaps there is somebody who have wrapped the whole process, and I need not reinvent the wheel.

Best Answer

Converting a PDF which is mainly text into images will almost certainly increase the file size, not decrease it. PDFs are quite efficient at containing text — converting to image negates that as you are then just storing images.

Preview includes a Reduce File Size option for PDFs. Open the PDF with Preview, choose File → Export and select Quartz Filter: Reduce File Size. You can also choose Black & White here which may also reduce the file size.

If you really want to see a convert of your PDF to a PDF of images, you can use ImageMagick.

convert /path/to/in.pdf -resize 100% -compress Group4 /path/to/out.pdf