Presentation 2003/10/30
Evaluation of the Document Clustering Method Based on Commonality Analysis of Multiple Documents(Natural Language Understanding and Models of Communication)
Takahiko KAWATANI,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper describes evaluation of a non-hierarchical clustering method based on multi-document commonality analysis proposed by the author. In the method, a document extracted as a seed grows up to a cluster by iteratively merging documents with the same topic. It features in obtaining document-cluster similarity that it uses a new similarity measure reflecting term co-occur information and that specific terms and term pairs extracted from the current cluster are used. In experiments using 7546 documents extracted from 38 events in TDT2 corpus, 36 events were extracted as the clusters with 94.41% clustering accuracy.
Keyword(in Japanese) (See Japanese page)
Keyword(in English)
Paper # NLC2003-31
Date of Issue

Conference Information
Committee NLC
Conference Date 2003/10/30(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Natural Language Understanding and Models of Communication (NLC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Evaluation of the Document Clustering Method Based on Commonality Analysis of Multiple Documents(Natural Language Understanding and Models of Communication)
Sub Title (in English)
Keyword(1)
1st Author's Name Takahiko KAWATANI
1st Author's Affiliation Hewlett-Packard Labs Japan, Hewlett-Packard Japan()
Date 2003/10/30
Paper # NLC2003-31
Volume (vol) vol.103
Number (no) 407
Page pp.pp.-
#Pages 8
Date of Issue