文書集合における重要語の抽出

講演名	1998/5/13 文書集合における重要語の抽出小泉敦延, 奥田敬, 伊藤秀一,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	与えられた文書集合における重要語を単語の頻度情報に基づき抽出を行う手法について述べる。ここでいう重要語とは、文書集合の他の文書を識別するのに役立ち、文書の内容を表すような単語のことをいう。文書dの単語wにおける条件付き分布p(d\|w)と、事前分布として与えられた文書の分布p(d)との間をダイバージェンスで測った値を各単語wの重みと定め、これにより重要語の抽出を行なう。実際に提案した手法をもちいて実験をおこない提案手法の性質を調べた。
抄録(英)	This paper describes a method to extract important word from a document set using word frequency. We propose the notion of important word which represenets the inter-document characteristics within the given document set and the content of documents. The Kullback-Leibler distances between p(d\|w), which is probability of a document conditioned by a word, and p(d) are calculated and the words are ranked by this quantity. Experimental results are shown and discussed.
キーワード(和)	重要語 / 自動抽出 / 文書解析 / ダイバージェンス
キーワード(英)	important word / auto extraction / document analysis / divergence
資料番号
発行日

講演論文情報詳細
申込み研究会	Data Engineering (DE)
本文の言語	JPN
タイトル（和）	文書集合における重要語の抽出
サブタイトル（和）
タイトル（英）	Extraction of Important Words in the Document Set
サブタイトル（和）
キーワード(1)（和/英）	重要語 / important word
キーワード(2)（和/英）	自動抽出 / auto extraction
キーワード(3)（和/英）	文書解析 / document analysis
キーワード(4)（和/英）	ダイバージェンス / divergence
第 1 著者氏名（和/英）	小泉敦延 / Atsunobu Koizumi
第 1 著者所属（和/英）	電気通信大学大学院情報システム研究科 Graduate School of Information Systems, Univ.Electro-Communications
第 2 著者氏名（和/英）	奥田敬 / Takashi Okuda
第 2 著者所属（和/英）	富士通 Fujitsu
第 3 著者氏名（和/英）	伊藤秀一 / Syuich Itoh
第 3 著者所属（和/英）	電気通信大学大学院情報システム研究科 Graduate School of Information Systems, Univ.Electro-Communications
発表年月日	1998/5/13
資料番号
巻番号（vol）	vol.98
号番号（no）	42
ページ範囲	pp.-
ページ数	6
発行日