Optical Character Recognition (OCR) Limitations

What are the Limitations of OCR?

Although Optical Character Recognition (OCR) scanning technology has increased rapidly over the years, there are, however, limitations in regards to the source materials and character formatting.

Text from a source with a font size of less than 12 points will results in more errors.
Most document formatting is lost during text scanning, except for paragraph marks and tab stops. Sometimes bold, italics and underline are recognised, depending on your software.
The output from a finished text scan will be a single column editable computer file. This computer file will always require spellchecking and proofreading as well as reformatting to desired final layout.
Scanning of plain text files or spreadsheet print outs usually work, however the data needs to be reformatted to match the original.
Source materials that often cause issues are:
- Forms
- Small text
- Blurry copies
- Mathematical formulas
- Draft copies
- Colored paper
- Handwritten text
- Unusual or script-type fonts
- Document formatting may be lost during text scanning (i.e, bold, otalic & underline are not always recognized).
- Output from a finished text scan may be a single column editable text file. Text file will always require spellchecking and proofreading as well as reformatting to desired final layout

Suggestion

CZUR Book & Document Scanner with Smart OCR

Suggestion

Adobe Acrobat Ninja: A productivity guide with tips and proven techniques