Presentation | 2016-01-21 Detection of Mis-converted Chinese Characters in Text Proofreading by Co-occurrence Words Takashi Kajiya, Shun Hattori, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Most of the existing tools for text proofreading detect mis-converted Chinese characters in a target text by judging based on whether or not they match the prepared example(s) of mis-conversion. However, such a method cannot detect unknown mis-converted Chinese characters that do not exist in the prepared examples of mis-conversion. Therefore, this paper proposes a novel system that extracts clauses by morphological-analyzing an input sentence, and acquires a contextualized conversion for each clause by choosing its one candidate which have the greatest co-occurrence with the surrounding clauses. The proposed system assesses the degree of the co-occurrence between clauses by using enormous pages in the exponentially-growing Web. And the system has the capability of detecting unknown mis-converted Chinese characters, because it does not have to prepare a set of examples of mis-conversion unlike most of the existing tools for text proofreading. By inputting 200 sentences of 100 sentences with only one mis-conversion and the corrected 100 sentences without the one mis-conversion to the proposed system, the evaluation experiment measures its precision of detecting mis-converted Chinese characters. As a result, the system has achieved 62% at the most for the ratio of true alarm, and 4% stably for the ratio of false alarm, depending on its parameter of the number of Web pages for assessing the co-occurrence. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Mis-conversion Detection / Co-occurrence / Text Proofreading / Morphological Analysis / Web Mining |
Paper # | IN2015-98 |
Date of Issue | 2016-01-14 (IN) |
Conference Information | |
Committee | IN |
---|---|
Conference Date | 2016/1/21(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Nagoya Kigyou Fukushi Kaikan |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Contents Delivery/Contents Exchange, Social Networking Service (SNS), Data Analysis/Processing Platform, Big Data, etc. |
Chair | Hidetsugu Kobayashi(NTT) |
Vice Chair | Katsunori Yamaoka(Tokyo Inst. of Tech.) |
Secretary | Katsunori Yamaoka(NTT) |
Assistant | Yuichi Sudo(NTT) / Kunitake Kaneko(Keio Univ.) |
Paper Information | |
Registration To | Technical Committee on Information Networks |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Detection of Mis-converted Chinese Characters in Text Proofreading by Co-occurrence Words |
Sub Title (in English) | |
Keyword(1) | Mis-conversion Detection |
Keyword(2) | Co-occurrence |
Keyword(3) | Text Proofreading |
Keyword(4) | Morphological Analysis |
Keyword(5) | Web Mining |
1st Author's Name | Takashi Kajiya |
1st Author's Affiliation | Muroran Institute of Technology(Muroran Inst. of Tech.) |
2nd Author's Name | Shun Hattori |
2nd Author's Affiliation | Muroran Institute of Technology(Muroran Inst. of Tech.) |
Date | 2016-01-21 |
Paper # | IN2015-98 |
Volume (vol) | vol.115 |
Number (no) | IN-405 |
Page | pp.pp.19-22(IN), |
#Pages | 4 |
Date of Issue | 2016-01-14 (IN) |