- 08.28.08 -

menu
- types
- history
  Hand print OCR

Computer systems for recognizing printed text have enjoyed a lot of success in the recent years. Among these are the input device for personal digital assistants such as those running Palm OS. The algorithms take advantage of  the order, speed, and direction of each individual line's segments at input being known. Also, the user can retrain the computer to recognize any particular character. This method cannot be used in software that scans paper documents, so accurate recognition of hand-printed documents is still largely an open problem. Accuracy rates of 80% to 90% on neat, clean hand-printed characters can be achieved, but that accuracy rate still translates to dozens of errors per page, making the technology useful only in very limited contexts. This variety of OCR is now commonly known in the industry as "ICR" or intelligent character recognition for short.



Cursive OCR

Recognition of cursive text is an active area of research, with recognition rates even lower than that of hand-printed text. Higher rates of recognition of general cursive script will likely not be possible without the use of contextual or grammatical information. For example, recognizing entire words from a dictionary is easier than trying to parse individual characters from script. Reading the Amount line of a cheque (which is always a written out number) is an example where using a smaller dictionary can increase recognition rates greatly. Knowledge of the grammar of the language being scanned can also help determine if a word is likely to be a verb or a noun, for example, allowing greater accuracy. The shapes of individual cursive characters themselves simply do not contain enough information to accurately (greater than 98%) recognize all handwritten cursive script.

Music OCR

Early research into recognition of printed sheet music was performed in the mid 1970s at MIT and other institutions. Successive efforts were made to localize and remove musical staff lines leaving symbols to be recognized and parsed. The first proprietary music-scanning program, MIDISCAN, was released in 1991. Three proprietary products are now available but music OCR software does not recognize handwritten scores.

MICR

One area where accuracy and speed of computer input of character information exceeds that of humans is in the area of magnetic ink character recognition, where the error rates range around one read error for every 20,000 to 30,000 checks.




Other

A particularly difficult problem for computers and humans is that of old church baptismal and marriage records containing mostly names. The pages may be damaged by age, water or fire and the names may be obsolete or contain rare spellings. Another research area is cooperative approaches, where computers assist humans and vice-versa. Computer image processing techniques can assist humans in reading extremely difficult texts such as the Archimedes Palimpsest or the Dead Sea Scrolls.
Generally, for more complex recognition problems neural networks are commonly used as they generally can be made indifferent to both affine and non-linear transformations.
A related area is raster to vector conversion, converting bitmap images (for example, maps including drawings, text, and map symbols) into vector graphics that are easier to work with.

More Resources:

Related: 5000 series ocr reader, lexmark x ocr, ocr syllabus, readiris pro 9 ocr software review

SITEMAP
© OCR-Ghost (Free OCR Software) 2006-2007
Operated by Ghost Networks Inc.