文脈情報を利用した多文字複合語の分割

講演名	2001/5/4 文脈情報を利用した多文字複合語の分割韓東力, 加藤浩一, 古郡廷治,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	辞書に未登録の複合語の解析は、実用的な自然言語システムを実現する上で解決しなければならない困難な問題の一つである。本稿では、長漢字列(多文字複合語)を対象にし、分割可能な語間の共起情報を使って、複合語を単位語に分割する実験を試みた。共起情報の算出にあたっては、複合語そのものだけではなく、その複合語を含む文脈も考慮した。実験結果では、90%以上の分割正解率を得た。
抄録(英)	Analyzing compound words is one of the crucial problems in constructing practieal natural language processing systems. In this paper, we propose a method for segmenting compound word, which consists of a long sequence of Kanji characters, in text by using statistics on word co-occurrences. We conducted an experiment that used the co-occurrence information within the compound word and the context in whieh it appreared. Its result shows a success rate of over 90% in dividing the compound words into their unit words.
キーワード(和)	多文字複合語 / 分割 / 共起情報 / 相互情報量 / 文脈情報
キーワード(英)	Compound word / Segmentation / Co-occurrence / Mutual information / Contextual information
資料番号	NLC2001-5
発行日

講演論文情報詳細
申込み研究会	Natural Language Understanding and Models of Communication (NLC)
本文の言語	JPN
タイトル（和）	文脈情報を利用した多文字複合語の分割
サブタイトル（和）
タイトル（英）	Automatic Segmentation of Compound Word in Japanese using Contextual Information
サブタイトル（和）
キーワード(1)（和/英）	多文字複合語 / Compound word
キーワード(2)（和/英）	分割 / Segmentation
キーワード(3)（和/英）	共起情報 / Co-occurrence
キーワード(4)（和/英）	相互情報量 / Mutual information
キーワード(5)（和/英）	文脈情報 / Contextual information
第 1 著者氏名（和/英）	韓東力 / Dongli Han
第 1 著者所属（和/英）	電気通信大学情報工学科 Department of Computer Science, The University of Electro-Communications
第 2 著者氏名（和/英）	加藤浩一 / Koichi Kato
第 2 著者所属（和/英）	電気通信大学情報工学科 Department of Computer Science, The University of Electro-Communications
第 3 著者氏名（和/英）	古郡廷治 / Teiji Furugori
第 3 著者所属（和/英）	電気通信大学情報工学科 Department of Computer Science, The University of Electro-Communications
発表年月日	2001/5/4
資料番号	NLC2001-5
巻番号（vol）	vol.101
号番号（no）	40
ページ範囲	pp.-
ページ数	6
発行日