Ubuntu – How to extract text from images

ocrsoftware-recommendation

How can I extract text from images?

I am not talking about scanned files, but garden variety images, such as when you take a high-def picture of a blackboard at class, and it is nicely handwritten; or when you photograph a page from a recipe book and want the recipe in text format.

Any free and open software for that?

I tried tesseract, and the results were awful.

Best Answer

The act of extracting text from images is called OCR and Ubuntu has a wiki page dedicated to OCR. From that page:

Available OCR tools

The Ubuntu Universe repositories contain the following OCR tools:

gocr - A command line OCR
fuzzyocr - spamassassin plugin to check image attachments
libhocr0 - Hebrew OCR
ocrad - Optical Character Recognition program
ocrfeeder - Document layout analysis and optical character recognition system
ocropus - document analysis and OCR system
tesseract-ocr

The Ubuntu multiverse respositories also contain:

cuneiform - multi-language OCR system

Some packages are outdated, but unofficial fresh ones can be found in Alex_P PPA (PPA adding code: ppa:alex-p/notesalexp). If you never used a PPA check how to add software from a PPA.

edit: As shown in comment Clara OCR exists too but it got stuk at Hardy and their website has 2009 as last updated.

Best Answer

Related Solutions

Ubuntu – Extract images from PDF with layer masks

Openoffice

Gimp

Ubuntu – How to add text and images (for example, a signature) to a PDF

Related Question