We can use OCR technology to capture data for many projects. OCR allows the computer to recognize characters, lines, tables, or graphs in a document instead of having a person view and type the data.
OCR works best when:
- The printing on each document is clear with consistent separation between letters and sentences, columns in tables, surrounding graphics, etc. Also, drawings, graphics or art need to have clearly defined lines. This allows the scanner and OCR software to easily recognize each separate character of type or line of a drawing.
- The goal is to capture all of the data on a document (full text data capture) or data is in a uniform place on a form. For example, a form with the name and address typed in the same place on each form is usually easy for an OCR engine to recognize and place that information in the correct field of the database.
With very clear type or uniform forms, OCR can often achieve accuracy rates above 95% to 97%. To get higher accuracy, we have staff members visually verify key information, or any field that is highlighted by the OCR software due to lack of "confidence" about accuracy.
For many projects, poor quality documents containing the source data does not allow OCR accuracy above 90%. In these cases we usually discover that double key data entry achieves better accuracy at a lower cost than OCR with human proofreading.