In Adobe Acrobat (I'm using Pro DC if that matters), there are three options for OCR:
- "Searchable Image".
- "Searchable Image (Exact)".
- "Editable Text and Images".
What are the differences between these three options?
In particular, what determines the output file size? Right now I've been running both the 1st and 3rd options and it seems that sometimes one is bigger and sometimes the other is bigger (and the differences can be substantial).
What (if any) are the trade-offs between quality, file size, and speed of OCR processing?
Best Answer
The Adobe Help article Scan a paper document to PDF, section Recognize Text - General Settings dialog box, defines the scan modes as :
I will analyze the effect of these options on the output file size.
All options keep the image, which is probably a large object.
Searchable Image rotates the image, which might change its size making it larger or smaller, depending on the image re-encoding method used internally by Adobe
Downsample To can reduce the image resolution and so reduce its size, but the amount of space gained (or lost) depends on the re-sample method used internally by Adobe.
Editable Text & Images synthesizes a new font, which is then included in the PDF and will add several dozens of K-bytes to the output size.
All in all, there is no clear method for creating the smallest PDF. The amount gained (or lost) depends on both the images being OCR'ed and how efficiently they can be re-compressed by Adobe.
If the aim is to save space, I would suggest to use Editable Text & Images, but as described in this Adobe Acrobat article, specify in Settings "Use available system font" which might avoid the custom font. You may also delete the images, if the OCR'ed text is enough.