Presentation 2011-06-07
Comparisons of document clustering algorithms and criterion functions
Toshio UCHIYAMA, Takeharu EDA, Katsuji BESSHO, Ko FUJIMURA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) This paper investigates the performance of two criterion functions and four different algorithms for document clustering. The criteria, that we evaluate, are the cosine similarity criterion and the entropy-based criterion. The quality of a clustering solution is evaluated how the various classes of documents are distributed within each cluster. We present an experimental evaluation involving all combination of criterion functions and algorithms. Our experimental results show that the entropy-based criterion is superior to the cosine similarity's and that competitive learning algorithm with the entropy-based criterion achieves the best performance.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Clustering / Entropy-based criterion / cosine similarity / Competitive learning / skew-divergence
Paper # DE2011-16,PRMU2011-47
Date of Issue

Conference Information
Committee DE
Conference Date 2011/5/30(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Data Engineering (DE)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Comparisons of document clustering algorithms and criterion functions
Sub Title (in English)
Keyword(1) Clustering
Keyword(2) Entropy-based criterion
Keyword(3) cosine similarity
Keyword(4) Competitive learning
Keyword(5) skew-divergence
1st Author's Name Toshio UCHIYAMA
1st Author's Affiliation NTT Cyber Solutions Laboratories, NTT CORPORATION()
2nd Author's Name Takeharu EDA
2nd Author's Affiliation NTT Cyber Solutions Laboratories, NTT CORPORATION
3rd Author's Name Katsuji BESSHO
3rd Author's Affiliation NTT Cyber Solutions Laboratories, NTT CORPORATION
4th Author's Name Ko FUJIMURA
4th Author's Affiliation NTT Cyber Solutions Laboratories, NTT CORPORATION
Date 2011-06-07
Paper # DE2011-16,PRMU2011-47
Volume (vol) vol.111
Number (no) 76
Page pp.pp.-
#Pages 6
Date of Issue