Presentation | 1998/9/18 Japanese Word Extraction from a Sequence of Similarly Shaped Character Categories Katsuhiko Itonori, Masaharu Ozaki, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | A fast word extraction techique from Japanese document images is described. It classifies each character image not into characters but into a small number of categories, each of which consists of similarly shaped characters. After the classification, it performs morphological analysis on the obtained sequence of the categories to reduce character candidates. Finally, detailed classification is performed on character images which cannot be identified as single characters. As a result of the experiments for the learning samples, the classification accuracy into the categories was 99.3% and the speed was eight times faster than traditional Japanese OCRs. From the result of experiments for actual text samples, we confirmed that the classification speed is ten times faster for them. The morphological analysis effectively reduced the number of character candidates with the fact that 85% of characters can be identified as single characters and the number of the candidates was 2.8. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Word Extraction / Similarly Shaped Characters / Character Recognition / Document Image / Information Retrieval |
Paper # | PRMU98-87 |
Date of Issue |
Conference Information | |
Committee | PRMU |
---|---|
Conference Date | 1998/9/18(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Pattern Recognition and Media Understanding (PRMU) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Japanese Word Extraction from a Sequence of Similarly Shaped Character Categories |
Sub Title (in English) | |
Keyword(1) | Word Extraction |
Keyword(2) | Similarly Shaped Characters |
Keyword(3) | Character Recognition |
Keyword(4) | Document Image |
Keyword(5) | Information Retrieval |
1st Author's Name | Katsuhiko Itonori |
1st Author's Affiliation | Fuji Xerox Co., Ltd. Office Document Products Group() |
2nd Author's Name | Masaharu Ozaki |
2nd Author's Affiliation | Development Center for IT Businesses |
Date | 1998/9/18 |
Paper # | PRMU98-87 |
Volume (vol) | vol.98 |
Number (no) | 275 |
Page | pp.pp.- |
#Pages | 8 |
Date of Issue |