I'm trying to pre-process a huge amount of PDF files, many of them not actually text but images in order to move them to a proper location to OCR processing.
The problem is I've tried to detect if PDF is image based prior to OCR but no success so far.
Using "pdffonts filename
" is supposed the correct approach but image only PDFs got fonts too!
Best Answer
Should do the trick. This gives you a list of images contained in the PDF file.