文章校正における共起語を用いた漢字の誤変換の検出

梶谷 貴士; 服部 峻

講演名	2016-01-21 文章校正における共起語を用いた漢字の誤変換の検出梶谷貴士(室蘭工大), 服部峻(室蘭工大),
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	既存の文章校正ツールによる文章中の漢字の誤変換の指摘は，予め用意された誤変換の用例と合致するか否かで判断しているものが多い．しかしながら，このような方法では，予め用意された誤変換の用例集に含まれない未知の漢字の誤変換を指摘することはできない．そこで本稿では，入力された文を形態素解析して切り出した文節ごとに変換候補を求め，各文節に対する複数の候補の中から，その文節の近傍に存在している文節群との共起性が最も高いものを選択することによって，その文章の文脈に相応しい，正しい変換を精確に導き出すシステムを提案する．文節同士の共起性の指標である共起度は，日々増大して行くWeb上のページ群を活用して算定する．また，提案システムは，多くの既存の文章校正ツールとは異なり，予め用意された誤変換の用例を使わないため，未知の漢字の誤変換に対しても検出できる可能性がある．評価実験として，文中に漢字の誤変換を必ず1つのみ含む文100個とその誤変換を正しく変換した同じ文100個を用意し，計200個の文を提案システムに入力して，漢字の誤変換の検出精度を測定した．その結果，パラメータに依って最大で62%という誤字訂正率と，一様に4%という正字誤訂正率が得られた．
抄録(英)	Most of the existing tools for text proofreading detect mis-converted Chinese characters in a target text by judging based on whether or not they match the prepared example(s) of mis-conversion. However, such a method cannot detect unknown mis-converted Chinese characters that do not exist in the prepared examples of mis-conversion. Therefore, this paper proposes a novel system that extracts clauses by morphological-analyzing an input sentence, and acquires a contextualized conversion for each clause by choosing its one candidate which have the greatest co-occurrence with the surrounding clauses. The proposed system assesses the degree of the co-occurrence between clauses by using enormous pages in the exponentially-growing Web. And the system has the capability of detecting unknown mis-converted Chinese characters, because it does not have to prepare a set of examples of mis-conversion unlike most of the existing tools for text proofreading. By inputting 200 sentences of 100 sentences with only one mis-conversion and the corrected 100 sentences without the one mis-conversion to the proposed system, the evaluation experiment measures its precision of detecting mis-converted Chinese characters. As a result, the system has achieved 62% at the most for the ratio of true alarm, and 4% stably for the ratio of false alarm, depending on its parameter of the number of Web pages for assessing the co-occurrence.
キーワード(和)	誤変換検出 / 共起性 / 文章校正 / 形態素解析 / Webマイニング
キーワード(英)	Mis-conversion Detection / Co-occurrence / Text Proofreading / Morphological Analysis / Web Mining
資料番号	IN2015-98
発行日	2016-01-14 (IN)

研究会情報
研究会	IN
開催期間	2016/1/21(から2日開催)
開催地（和）	名古屋企業福祉会館
開催地（英）	Nagoya Kigyou Fukushi Kaikan
テーマ（和）	コンテンツ配信/流通、ソーシャルネットワーク(SNS)、データ分析・処理基盤、ビッグデータ及び一般
テーマ（英）	Contents Delivery/Contents Exchange, Social Networking Service (SNS), Data Analysis/Processing Platform, Big Data, etc.
委員長氏名（和）	小林秀承(NTT)
委員長氏名（英）	Hidetsugu Kobayashi(NTT)
副委員長氏名（和）	山岡克式(東工大)
副委員長氏名（英）	Katsunori Yamaoka(Tokyo Inst. of Tech.)
幹事氏名（和）	濱田貴広(NTT) / 北原武(KDDI)
幹事氏名（英）	Takahiro Hamada(NTT) / Takeshi Kitahara(KDDI)
幹事補佐氏名（和）	首藤裕一(NTT) / 金子晋丈(慶大)
幹事補佐氏名（英）	Yuichi Sudo(NTT) / Kunitake Kaneko(Keio Univ.)

講演論文情報詳細
申込み研究会	Technical Committee on Information Networks
本文の言語	JPN
タイトル（和）	文章校正における共起語を用いた漢字の誤変換の検出
サブタイトル（和）
タイトル（英）	Detection of Mis-converted Chinese Characters in Text Proofreading by Co-occurrence Words
サブタイトル（和）
キーワード(1)（和/英）	誤変換検出 / Mis-conversion Detection
キーワード(2)（和/英）	共起性 / Co-occurrence
キーワード(3)（和/英）	文章校正 / Text Proofreading
キーワード(4)（和/英）	形態素解析 / Morphological Analysis
キーワード(5)（和/英）	Webマイニング / Web Mining
第 1 著者氏名（和/英）	梶谷貴士 / Takashi Kajiya
第 1 著者所属（和/英）	室蘭工業大学(略称：室蘭工大) Muroran Institute of Technology(略称：Muroran Inst. of Tech.)
第 2 著者氏名（和/英）	服部峻 / Shun Hattori
第 2 著者所属（和/英）	室蘭工業大学(略称：室蘭工大) Muroran Institute of Technology(略称：Muroran Inst. of Tech.)
発表年月日	2016-01-21
資料番号	IN2015-98
巻番号（vol）	vol.115
号番号（no）	IN-405
ページ範囲	pp.19-22(IN),
ページ数	4
発行日	2016-01-14 (IN)