In Association with Amazon.com

Optical Character Recognition

Optical Character Recognition, OCR, is a method of converting well-formed, e.g. typed, text in a scanned image file into text that can be edited in a word processor.

Limited OCR capability is frequently included with the scanning application suites that are bundled with most scanners, e.g. HP PrecisionScan. Commercial applications such as Abbyy FineReader and OmniPage Pro offer greater accuracy and the ability to recognize more complex page layouts.

HP PrecisionScan

Basic OCR capability is included in PrecisionScan and is performed automatically by selecting the Text option.

PrecisionScan does not provide a way to open existing image files so OCR can only be performed on the scanner's preview image.

OmniPage Pro

OmniPage Pro has the ability to open a previously scanned image file and recognize complex page layouts. It this respect it might be more appropriately characterized as an Optical Document Reader application.

The image below illustrates the first step in the conversion process, image file acquisition.

Original image display after file acquisition.

IE float bug

OCR zones identify the areas to be converted.

OCR zones are placed on the image to identify areas that will be converted.

IE float bug

The conversion is initiated and the resulting text is proofed.

Final editing can be performed with OPP's text editor and the file saved in a number of common document formats. Conversion accuracy from the slightly degraded original image was 96%, four words were not recognized properly and there were several punctuation errors.

Conversion result is a text file that can be edited with the resident Text Editor.

IE float bug
Pages