Presentation 2022-01-27
Successive Binary Partition k-Means Algorithm
Akinori Ito,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) We propose a clustering algorithm that reduces the bias in the number of samples belonging to a cluster. Conventional clustering algorithms, such as the K-means method, aim to inimize the mean square error between the centroid of the cluster and the samples in the cluster, which may lead to a biased number of samples in the resulting cluster. In response to this, several algorithms have been proposed for optimization, such as the Balanced k-means method, which adds the bias in the number of clusters to the objective function in addition to the error between the samples and the centroid. Although these methods are general and can control the trade-off between bias and error, they have a drawback of slow speed. In this paper, we propose a heuristic algorithm that is relatively fast and has a small bias in the number of samples in the clusters. This method uses the k-means method to successively divide clusters with large cluster sizes into two parts, and when the desired number of clusters is reached, the centroid is optimized again. The last optimization allows us to adjust the bias and error of the clusters. Two experiments were conducted for evaluation. In the first experiment, we applied the proposed method and the k-means, k-means++, and balanced k-means methods to the generated data and measured the computation time. The results showed that the proposed method was slower than the k-means method, but faster than the other methods. In the second experiment, the image data from CIFAR-100 were clustered and compared with k-means. We measured the error and cluster size bias, and the results showed that we could control the trade-off between error and cluster size bias.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) clustering / k-means method / balanced k-means method
Paper # EMM2021-90
Date of Issue 2022-01-20 (EMM)

Conference Information
Committee EMM
Conference Date 2022/1/27(1days)
Place (in Japanese) (See Japanese page)
Place (in English) Online
Topics (in Japanese) (See Japanese page)
Topics (in English) Sense of Presence, Universal Media, Digital Entertainment, etc.
Chair Ryoichi Nishimura(NICT)
Vice Chair Masaaki Fujiyoshi(Tokyo Metropolitan Univ.) / Masatsugu Ichino(Univ. of Electro-Comm.)
Secretary Masaaki Fujiyoshi(Utsunomiya Univ.) / Masatsugu Ichino(NICT)
Assistant Shoko Imaizumi(Chiba Univ.) / Youichi Takashima(Kaishi Professional Univ.)

Paper Information
Registration To Technical Committee on Enriched MultiMedia
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Successive Binary Partition k-Means Algorithm
Sub Title (in English) A clustering algorithm with less sample bias
Keyword(1) clustering
Keyword(2) k-means method
Keyword(3) balanced k-means method
1st Author's Name Akinori Ito
1st Author's Affiliation Tohoku University(Tohoku Univ.)
Date 2022-01-27
Paper # EMM2021-90
Volume (vol) vol.121
Number (no) EMM-362
Page pp.pp.37-42(EMM),
#Pages 6
Date of Issue 2022-01-20 (EMM)