時系列性を持つテキストのクラスタリング(テーマセッション,自然言語とパターン認識の境界)

Presentation	2012-06-29 Clustering of Text Documents with Time Series Features Reiko HAMADA, Shin'ichi SATOH,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Most traditional text clustering methods are based on "bag of words" representation. However, since topics in closed captions are very short, it is difficult to supply enough "bag of words" information. In this paper, we present a new approach for topic clustering on closed captions. First, we clustered one-year closed captions manually, and discovered that topics categorized into "accidents" or "disasters" shows particular cluster- and time- distributions. Therefore we proposed tin-supervised clustering method using temporal information adding to conventional word vector distance. Furthermore, the accuracy was improved by changing the linkage criteria as the clustering process progresses. Our experimental results show the effectiveness of combination of time and word information.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Text Processing / Clustering / Temporal Information
Paper #	NLC2012-2,PRMU2012-22
Date of Issue

Paper Information
Registration To	Natural Language Understanding and Models of Communication (NLC)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Clustering of Text Documents with Time Series Features
Sub Title (in English)
Keyword(1)	Text Processing
Keyword(2)	Clustering
Keyword(3)	Temporal Information
1st Author's Name	Reiko HAMADA
1st Author's Affiliation	National Institute of Informatics()
2nd Author's Name	Shin'ichi SATOH
2nd Author's Affiliation	National Institute of Informatics
Date	2012-06-29
Paper #	NLC2012-2,PRMU2012-22
Volume (vol)	vol.112
Number (no)	110
Page	pp.pp.-
#Pages	6
Date of Issue