Bash – detect if PDF file is made from images

bashpdfshell-script

I'm trying to pre-process a huge amount of PDF files, many of them not actually text but images in order to move them to a proper location to OCR processing.

The problem is I've tried to detect if PDF is image based prior to OCR but no success so far.
Using "pdffonts filename" is supposed the correct approach but image only PDFs got fonts too!

Best Answer

pdfimages -list filename.pdf

Should do the trick. This gives you a list of images contained in the PDF file.

Related Question