類似文字による日本語単語抽出

Presentation	1998/9/18 Japanese Word Extraction from a Sequence of Similarly Shaped Character Categories Katsuhiko Itonori, Masaharu Ozaki,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	A fast word extraction techique from Japanese document images is described. It classifies each character image not into characters but into a small number of categories, each of which consists of similarly shaped characters. After the classification, it performs morphological analysis on the obtained sequence of the categories to reduce character candidates. Finally, detailed classification is performed on character images which cannot be identified as single characters. As a result of the experiments for the learning samples, the classification accuracy into the categories was 99.3% and the speed was eight times faster than traditional Japanese OCRs. From the result of experiments for actual text samples, we confirmed that the classification speed is ten times faster for them. The morphological analysis effectively reduced the number of character candidates with the fact that 85% of characters can be identified as single characters and the number of the candidates was 2.8.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Word Extraction / Similarly Shaped Characters / Character Recognition / Document Image / Information Retrieval
Paper #	PRMU98-87
Date of Issue

Paper Information
Registration To	Pattern Recognition and Media Understanding (PRMU)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Japanese Word Extraction from a Sequence of Similarly Shaped Character Categories
Sub Title (in English)
Keyword(1)	Word Extraction
Keyword(2)	Similarly Shaped Characters
Keyword(3)	Character Recognition
Keyword(4)	Document Image
Keyword(5)	Information Retrieval
1st Author's Name	Katsuhiko Itonori
1st Author's Affiliation	Fuji Xerox Co., Ltd. Office Document Products Group()
2nd Author's Name	Masaharu Ozaki
2nd Author's Affiliation	Development Center for IT Businesses
Date	1998/9/18
Paper #	PRMU98-87
Volume (vol)	vol.98
Number (no)	275
Page	pp.pp.-
#Pages	8
Date of Issue