文章校正における共起語を用いた漢字の誤変換の検出

梶谷 貴士; 服部 峻

Presentation	2016-01-21 Detection of Mis-converted Chinese Characters in Text Proofreading by Co-occurrence Words Takashi Kajiya, Shun Hattori,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Most of the existing tools for text proofreading detect mis-converted Chinese characters in a target text by judging based on whether or not they match the prepared example(s) of mis-conversion. However, such a method cannot detect unknown mis-converted Chinese characters that do not exist in the prepared examples of mis-conversion. Therefore, this paper proposes a novel system that extracts clauses by morphological-analyzing an input sentence, and acquires a contextualized conversion for each clause by choosing its one candidate which have the greatest co-occurrence with the surrounding clauses. The proposed system assesses the degree of the co-occurrence between clauses by using enormous pages in the exponentially-growing Web. And the system has the capability of detecting unknown mis-converted Chinese characters, because it does not have to prepare a set of examples of mis-conversion unlike most of the existing tools for text proofreading. By inputting 200 sentences of 100 sentences with only one mis-conversion and the corrected 100 sentences without the one mis-conversion to the proposed system, the evaluation experiment measures its precision of detecting mis-converted Chinese characters. As a result, the system has achieved 62% at the most for the ratio of true alarm, and 4% stably for the ratio of false alarm, depending on its parameter of the number of Web pages for assessing the co-occurrence.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Mis-conversion Detection / Co-occurrence / Text Proofreading / Morphological Analysis / Web Mining
Paper #	IN2015-98
Date of Issue	2016-01-14 (IN)

Conference Information
Committee	IN
Conference Date	2016/1/21(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Nagoya Kigyou Fukushi Kaikan
Topics (in Japanese)	(See Japanese page)
Topics (in English)	Contents Delivery/Contents Exchange, Social Networking Service (SNS), Data Analysis/Processing Platform, Big Data, etc.
Chair	Hidetsugu Kobayashi(NTT)
Vice Chair	Katsunori Yamaoka(Tokyo Inst. of Tech.)
Secretary	Katsunori Yamaoka(NTT)
Assistant	Yuichi Sudo(NTT) / Kunitake Kaneko(Keio Univ.)

Paper Information
Registration To	Technical Committee on Information Networks
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Detection of Mis-converted Chinese Characters in Text Proofreading by Co-occurrence Words
Sub Title (in English)
Keyword(1)	Mis-conversion Detection
Keyword(2)	Co-occurrence
Keyword(3)	Text Proofreading
Keyword(4)	Morphological Analysis
Keyword(5)	Web Mining
1st Author's Name	Takashi Kajiya
1st Author's Affiliation	Muroran Institute of Technology(Muroran Inst. of Tech.)
2nd Author's Name	Shun Hattori
2nd Author's Affiliation	Muroran Institute of Technology(Muroran Inst. of Tech.)
Date	2016-01-21
Paper #	IN2015-98
Volume (vol)	vol.115
Number (no)	IN-405
Page	pp.pp.19-22(IN),
#Pages	4
Date of Issue	2016-01-14 (IN)