サンプリング技術を利用した文章類似性評価(文書分類・翻訳)

Presentation	2007/7/17 Evaluation of the Similarity between Multiple Sentences using Sampling Techniques Ichiro YAMADA, Yohei NAKADA, Atsushi MATSUI, Takashi MATSUMOTO, Kikuka MIURA, Hideki SUMIYOSHI, Nobuyuki YAGI,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In the closed captions, there are a lot of typical expressions to express specific things, for example, first introduction of a guest in a talk show or explanation of a place in travel program. Such information helps us to put metadata to the corresponding scenes. This paper proposes a method to evaluate the similarity between multiple sentences in order to extract a section in which sentences are similar to the typical expressions expressing specific things. The first step generates tree structures from input section of sentences and extracts subtrees from these tree structures. We use Gibbsboost algorithm which samples these subtrees for features and learns the features to evaluate the similarity. In the experiment of judging whether a section of sentences is similar to the section which explains a place with video targeting closed captions of TV programs concerned with travel, we show the effectiveness of our method.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Metadata generation / Typical expression extraction / Tree Structure analysis / GibbsBoost Algorithm / sampling
Paper #	NLC2007-22
Date of Issue

Paper Information
Registration To	Natural Language Understanding and Models of Communication (NLC)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Evaluation of the Similarity between Multiple Sentences using Sampling Techniques
Sub Title (in English)
Keyword(1)	Metadata generation
Keyword(2)	Typical expression extraction
Keyword(3)	Tree Structure analysis
Keyword(4)	GibbsBoost Algorithm
Keyword(5)	sampling
1st Author's Name	Ichiro YAMADA
1st Author's Affiliation	NHK Science & Technical Research Laboratories()
2nd Author's Name	Yohei NAKADA
2nd Author's Affiliation	Dept. of Electrical Engineering and Bioscience, Waseda University
3rd Author's Name	Atsushi MATSUI
3rd Author's Affiliation	NHK Science & Technical Research Laboratories:Dept. of Electrical Engineering and Bioscience, Waseda University
4th Author's Name	Takashi MATSUMOTO
4th Author's Affiliation	Dept. of Electrical Engineering and Bioscience, Waseda University
5th Author's Name	Kikuka MIURA
5th Author's Affiliation	NHK Science & Technical Research Laboratories
6th Author's Name	Hideki SUMIYOSHI
6th Author's Affiliation	NHK Science & Technical Research Laboratories
7th Author's Name	Nobuyuki YAGI
7th Author's Affiliation	NHK Science & Technical Research Laboratories
Date	2007/7/17
Paper #	NLC2007-22
Volume (vol)	vol.107
Number (no)	158
Page	pp.pp.-
#Pages	6
Date of Issue