Presentation 1998/9/18
Japanese Word Extraction from a Sequence of Similarly Shaped Character Categories
Katsuhiko Itonori, Masaharu Ozaki,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) A fast word extraction techique from Japanese document images is described. It classifies each character image not into characters but into a small number of categories, each of which consists of similarly shaped characters. After the classification, it performs morphological analysis on the obtained sequence of the categories to reduce character candidates. Finally, detailed classification is performed on character images which cannot be identified as single characters. As a result of the experiments for the learning samples, the classification accuracy into the categories was 99.3% and the speed was eight times faster than traditional Japanese OCRs. From the result of experiments for actual text samples, we confirmed that the classification speed is ten times faster for them. The morphological analysis effectively reduced the number of character candidates with the fact that 85% of characters can be identified as single characters and the number of the candidates was 2.8.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Word Extraction / Similarly Shaped Characters / Character Recognition / Document Image / Information Retrieval
Paper # PRMU98-87
Date of Issue

Conference Information
Committee PRMU
Conference Date 1998/9/18(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Pattern Recognition and Media Understanding (PRMU)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Japanese Word Extraction from a Sequence of Similarly Shaped Character Categories
Sub Title (in English)
Keyword(1) Word Extraction
Keyword(2) Similarly Shaped Characters
Keyword(3) Character Recognition
Keyword(4) Document Image
Keyword(5) Information Retrieval
1st Author's Name Katsuhiko Itonori
1st Author's Affiliation Fuji Xerox Co., Ltd. Office Document Products Group()
2nd Author's Name Masaharu Ozaki
2nd Author's Affiliation Development Center for IT Businesses
Date 1998/9/18
Paper # PRMU98-87
Volume (vol) vol.98
Number (no) 275
Page pp.pp.-
#Pages 8
Date of Issue