Ubuntu – How to extract text from images

ocrsoftware-recommendation

How can I extract text from images?

I am not talking about scanned files, but garden variety images, such as when you take a high-def picture of a blackboard at class, and it is nicely handwritten; or when you photograph a page from a recipe book and want the recipe in text format.

Any free and open software for that?

I tried tesseract, and the results were awful.

Best Answer

The act of extracting text from images is called OCR and Ubuntu has a wiki page dedicated to OCR. From that page:

Available OCR tools

The Ubuntu Universe repositories contain the following OCR tools:

  1. gocr - A command line OCR
  2. fuzzyocr - spamassassin plugin to check image attachments
  3. libhocr0 - Hebrew OCR
  4. ocrad - Optical Character Recognition program
  5. ocrfeeder - Document layout analysis and optical character recognition system
  6. ocropus - document analysis and OCR system
  7. tesseract-ocr

The Ubuntu multiverse respositories also contain:

  1. cuneiform - multi-language OCR system

Some packages are outdated, but unofficial fresh ones can be found in Alex_P PPA (PPA adding code: ppa:alex-p/notesalexp). If you never used a PPA check how to add software from a PPA.

edit: As shown in comment Clara OCR exists too but it got stuk at Hardy and their website has 2009 as last updated.