HTML文書からの単語間の上位下位関係の自動獲得(獲得, 辞書)(言語理解とコミュニケーション)

新里 圭司; 鳥澤 健太郎

講演名	2003/10/31 HTML文書からの単語間の上位下位関係の自動獲得(獲得, 辞書)(言語理解とコミュニケーション) 新里圭司, 鳥澤健太郎,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	本稿では, 単語の上位下位関係をWWW上のドキュメントより自動獲得する手法を提案する. 従来より, 単語の上位下位関係は自然言語処理において重要な知識であると見なされており, 多くの自動獲得手法が提案されてきた. それらの多くは, 名詞句の並置などの文の表層のパターンに注目するものがほとんどであった. 本稿で提案する手法は, これらとは異なるアプローチをとる. より具体的には, 1) Web上にあるHTMLタグの繰り返しパターン, 2)従来情報検索などで使われてきたDF, IDFなどの統計量, 3)名詞が持つ主として動詞との係り受け関係の三種の情報を組み合わせることで, 単語の上位下位関係を自動的に獲得することを目指す.
抄録(英)	This paper describes an automatic acquisition method for hyponymy relations. Hyponymy relations play a crucial role in various natural language processing systems, and there have been many attempts to automatically acquire the relations from large-scale corpora. Most of the existing acquisition methods rely on particular linguistic patterns, such as juxtapositions, which specify hyponymy relations. Our method, however, does not use such linguistic patterns. We try to acquire hyponymy relations from three different types of clues. The first is repetitions of HTML tags found in usual HTML documents on the WWW. The second is statistical measures such as DF and IDF, which are popular in IR literatures. The third is a verb-noun co-occurrences found in normal corpora.
キーワード(和)	知識獲得 / 上位語 / 下位語 / 統計的自然言語処理 / WWW
キーワード(英)	Knowledge acquisition / Hypernym, Hyponym / Statistical Natural Language Processing / World Wide Web
資料番号	NLC2003-39
発行日

研究会情報
研究会	NLC
開催期間	2003/10/31(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Natural Language Understanding and Models of Communication (NLC)
本文の言語	JPN
タイトル（和）	HTML文書からの単語間の上位下位関係の自動獲得(獲得, 辞書)(言語理解とコミュニケーション)
サブタイトル（和）
タイトル（英）	Automatic acquisition of hyponymy-relations from HTML documents(Natural Language Understanding and Models of Communication)
サブタイトル（和）
キーワード(1)（和/英）	知識獲得 / Knowledge acquisition
キーワード(2)（和/英）	上位語 / Hypernym, Hyponym
キーワード(3)（和/英）	下位語 / Statistical Natural Language Processing
キーワード(4)（和/英）	統計的自然言語処理 / World Wide Web
キーワード(5)（和/英）	WWW
第 1 著者氏名（和/英）	新里圭司 / Keiji SHINZATO
第 1 著者所属（和/英）	北陸先端科学技術大学院大学情報科学研究科 Graduate School of Information Sciences, Japan Advanced Institute of Science and Technology
第 2 著者氏名（和/英）	鳥澤健太郎 / Kentaro TORISAWA
第 2 著者所属（和/英）	北陸先端科学技術大学院大学情報科学研究科 Graduate School of Information Sciences, Japan Advanced Institute of Science and Technology
発表年月日	2003/10/31
資料番号	NLC2003-39
巻番号（vol）	vol.103
号番号（no）	408
ページ範囲	pp.-
ページ数	8
発行日