Understanding OCR options in Adobe Acrobat: “Searchable Image”, “Searchable Image (Exact)”, and “Editable Text and Images”

adobe-acrobatocr

In Adobe Acrobat (I'm using Pro DC if that matters), there are three options for OCR:

"Searchable Image".
"Searchable Image (Exact)".
"Editable Text and Images".

What are the differences between these three options?

In particular, what determines the output file size? Right now I've been running both the 1st and 3rd options and it seems that sometimes one is bigger and sometimes the other is bigger (and the differences can be substantial).

What (if any) are the trade-offs between quality, file size, and speed of OCR processing?

Best Answer

The Adobe Help article Scan a paper document to PDF, section Recognize Text - General Settings dialog box, defines the scan modes as :

Searchable Image

Ensures that text is searchable and selectable. This option keeps the original image, deskews it as needed, and places an invisible text layer over it. The selection for Downsample Images in this same dialog box determines whether the image is downsampled and to what extent.

Searchable Image (Exact)

Ensures that text is searchable and selectable. This option keeps the original image and places an invisible text layer over it. Recommended for cases requiring maximum fidelity to the original image.

Editable Text & Images

Synthesizes a new custom font that closely approximates the original, and preserves the page background using a low-resolution copy.

Downsample To

Decreases the number of pixels in color, grayscale, and monochrome images after OCR is complete. Choose the degree of downsampling to apply. Higher-numbered options do less downsampling, producing higher-resolution PDFs.

I will analyze the effect of these options on the output file size.

All options keep the image, which is probably a large object.

Searchable Image rotates the image, which might change its size making it larger or smaller, depending on the image re-encoding method used internally by Adobe

Downsample To can reduce the image resolution and so reduce its size, but the amount of space gained (or lost) depends on the re-sample method used internally by Adobe.

Editable Text & Images synthesizes a new font, which is then included in the PDF and will add several dozens of K-bytes to the output size.

All in all, there is no clear method for creating the smallest PDF. The amount gained (or lost) depends on both the images being OCR'ed and how efficiently they can be re-compressed by Adobe.

If the aim is to save space, I would suggest to use Editable Text & Images, but as described in this Adobe Acrobat article, specify in Settings "Use available system font" which might avoid the custom font. You may also delete the images, if the OCR'ed text is enough.

Best Answer

Related Solutions

Mac – How to OCR PDF files with Old German Gothic (Fraktur) text

Can Acrobat 11 be made to do OCR using multiple CPU cores

Related Question