Presentation 2012-06-29
Clustering of Text Documents with Time Series Features
Reiko HAMADA, Shin'ichi SATOH,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Most traditional text clustering methods are based on "bag of words" representation. However, since topics in closed captions are very short, it is difficult to supply enough "bag of words" information. In this paper, we present a new approach for topic clustering on closed captions. First, we clustered one-year closed captions manually, and discovered that topics categorized into "accidents" or "disasters" shows particular cluster- and time- distributions. Therefore we proposed tin-supervised clustering method using temporal information adding to conventional word vector distance. Furthermore, the accuracy was improved by changing the linkage criteria as the clustering process progresses. Our experimental results show the effectiveness of combination of time and word information.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Text Processing / Clustering / Temporal Information
Paper # NLC2012-2,PRMU2012-22
Date of Issue

Conference Information
Committee NLC
Conference Date 2012/6/22(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Vice Chair

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Clustering of Text Documents with Time Series Features
Sub Title (in English)
Keyword(1) Text Processing
Keyword(2) Clustering
Keyword(3) Temporal Information
1st Author's Name Reiko HAMADA
1st Author's Affiliation National Institute of Informatics()
2nd Author's Name Shin'ichi SATOH
2nd Author's Affiliation National Institute of Informatics
Date 2012-06-29
Paper # NLC2012-2,PRMU2012-22
Volume (vol) vol.112
Number (no) 110
Page pp.pp.-
#Pages 6
Date of Issue