Presentation 2016-01-21
Detection of Mis-converted Chinese Characters in Text Proofreading by Co-occurrence Words
Takashi Kajiya, Shun Hattori,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Most of the existing tools for text proofreading detect mis-converted Chinese characters in a target text by judging based on whether or not they match the prepared example(s) of mis-conversion. However, such a method cannot detect unknown mis-converted Chinese characters that do not exist in the prepared examples of mis-conversion. Therefore, this paper proposes a novel system that extracts clauses by morphological-analyzing an input sentence, and acquires a contextualized conversion for each clause by choosing its one candidate which have the greatest co-occurrence with the surrounding clauses. The proposed system assesses the degree of the co-occurrence between clauses by using enormous pages in the exponentially-growing Web. And the system has the capability of detecting unknown mis-converted Chinese characters, because it does not have to prepare a set of examples of mis-conversion unlike most of the existing tools for text proofreading. By inputting 200 sentences of 100 sentences with only one mis-conversion and the corrected 100 sentences without the one mis-conversion to the proposed system, the evaluation experiment measures its precision of detecting mis-converted Chinese characters. As a result, the system has achieved 62% at the most for the ratio of true alarm, and 4% stably for the ratio of false alarm, depending on its parameter of the number of Web pages for assessing the co-occurrence.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Mis-conversion Detection / Co-occurrence / Text Proofreading / Morphological Analysis / Web Mining
Paper # IN2015-98
Date of Issue 2016-01-14 (IN)

Conference Information
Committee IN
Conference Date 2016/1/21(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Nagoya Kigyou Fukushi Kaikan
Topics (in Japanese) (See Japanese page)
Topics (in English) Contents Delivery/Contents Exchange, Social Networking Service (SNS), Data Analysis/Processing Platform, Big Data, etc.
Chair Hidetsugu Kobayashi(NTT)
Vice Chair Katsunori Yamaoka(Tokyo Inst. of Tech.)
Secretary Katsunori Yamaoka(NTT)
Assistant Yuichi Sudo(NTT) / Kunitake Kaneko(Keio Univ.)

Paper Information
Registration To Technical Committee on Information Networks
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Detection of Mis-converted Chinese Characters in Text Proofreading by Co-occurrence Words
Sub Title (in English)
Keyword(1) Mis-conversion Detection
Keyword(2) Co-occurrence
Keyword(3) Text Proofreading
Keyword(4) Morphological Analysis
Keyword(5) Web Mining
1st Author's Name Takashi Kajiya
1st Author's Affiliation Muroran Institute of Technology(Muroran Inst. of Tech.)
2nd Author's Name Shun Hattori
2nd Author's Affiliation Muroran Institute of Technology(Muroran Inst. of Tech.)
Date 2016-01-21
Paper # IN2015-98
Volume (vol) vol.115
Number (no) IN-405
Page pp.pp.19-22(IN),
#Pages 4
Date of Issue 2016-01-14 (IN)