Linux – How to rasterize all of the text in a PDF

linuxocrpdfpdftk

You know when you have a pdf, which is a scan of a document and it's a really huge file, because it just stores the picture of the scanned document?

And there are OCR tools which can help you to make a proper document which just stores the text?

Well, I need the reverse of that! Let's say I have a perfect pdf document generated with pdflatex and I need to turn it into such a "huge" pdf, which looks exactly the same when printed on paper (with a certain dpi value), but is just a picture of the original.

My initial idea is to turn the pdf into a series of JPGs and then back into a PDF, but perhaps there is some canonical way for that?


In case you wonder why I would want to do such a thing: I'm currently stuck with a network printer, which is not maintained by me, and which randomly drops characters in printed files! So until someone figures out what's wrong there, I want this as workaround.

Best Answer

You could test out if image based PDF's are polluted as well. First convert PDF to (multipage) TIFF, e.g. with ghostscript:

gs -sDEVICE=tiffg4 -o sample.tif sample.pdf

Then convert the TIFF to PDF, e.g.:

tiff2pdf -z -f -F -pA4 -o sample-img.pdf sample.tif

This result in a PDF file where the pages are images instead of text.

Alternatively, if your system supports printing of TIFF files try to print it directly.

There is also the option of pdf2ps for converting PDF to PS, which if works, would likely be preferable.

Related Question