Presentation 2006-11-23
An Introduction to Research on Document Understanding and Character Recognition : Hitachi's Case
Hiroshi Shinjo,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Hitachi has researched document understanding and character recognition for over thirty years. This paper is an introduction of our OCR (Optical Character Recognition) products and research work. Our products are consists of two types. One is a conventional Hardware-OCR, which are form OCR and mail sorting machine. The other is a new type OCR product, which are pen device product and camera based OCR, in order to provide new opportunities of use of OCR. In this paper, we present our research work from the point of view of products. Firstly, layout analysis is explained by using form OCR. Form layout analysis needs robustness to low quality images because of general use. Secondly, character segmentation and linguistic processing are explained by using postal address recognition. To solve ambiguity of character segmentation and recognition in address line, we integrate segmentation, classification and Linguistic interpretation at the same time. Thirdly, character classification is explained as common technology for our all OCR products. The classification work is based on directional feature extraction and statistical discriminant models. Finally, we introduce a unconventional OCR technology, which are digital pen product and camera based OCR. The digital pen can link electronic data to paper documents. For camera based OCR, we develop a small-scale Kanji character recognition engine.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) form layout recognition / postal address recognition / character recognition / statistical discriminant model / information integration / camera based OCR
Paper # PRMU2006-127
Date of Issue

Conference Information
Committee PRMU
Conference Date 2006/11/16(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Pattern Recognition and Media Understanding (PRMU)
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) An Introduction to Research on Document Understanding and Character Recognition : Hitachi's Case
Sub Title (in English)
Keyword(1) form layout recognition
Keyword(2) postal address recognition
Keyword(3) character recognition
Keyword(4) statistical discriminant model
Keyword(5) information integration
Keyword(6) camera based OCR
1st Author's Name Hiroshi Shinjo
1st Author's Affiliation Central Research Laboratory, Hitachi, Ltd.()
Date 2006-11-23
Paper # PRMU2006-127
Volume (vol) vol.106
Number (no) 375
Page pp.pp.-
#Pages 6
Date of Issue