Presentation | 2022-01-27 Successive Binary Partition k-Means Algorithm Akinori Ito, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | We propose a clustering algorithm that reduces the bias in the number of samples belonging to a cluster. Conventional clustering algorithms, such as the K-means method, aim to inimize the mean square error between the centroid of the cluster and the samples in the cluster, which may lead to a biased number of samples in the resulting cluster. In response to this, several algorithms have been proposed for optimization, such as the Balanced k-means method, which adds the bias in the number of clusters to the objective function in addition to the error between the samples and the centroid. Although these methods are general and can control the trade-off between bias and error, they have a drawback of slow speed. In this paper, we propose a heuristic algorithm that is relatively fast and has a small bias in the number of samples in the clusters. This method uses the k-means method to successively divide clusters with large cluster sizes into two parts, and when the desired number of clusters is reached, the centroid is optimized again. The last optimization allows us to adjust the bias and error of the clusters. Two experiments were conducted for evaluation. In the first experiment, we applied the proposed method and the k-means, k-means++, and balanced k-means methods to the generated data and measured the computation time. The results showed that the proposed method was slower than the k-means method, but faster than the other methods. In the second experiment, the image data from CIFAR-100 were clustered and compared with k-means. We measured the error and cluster size bias, and the results showed that we could control the trade-off between error and cluster size bias. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | clustering / k-means method / balanced k-means method |
Paper # | EMM2021-90 |
Date of Issue | 2022-01-20 (EMM) |
Conference Information | |
Committee | EMM |
---|---|
Conference Date | 2022/1/27(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Online |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Sense of Presence, Universal Media, Digital Entertainment, etc. |
Chair | Ryoichi Nishimura(NICT) |
Vice Chair | Masaaki Fujiyoshi(Tokyo Metropolitan Univ.) / Masatsugu Ichino(Univ. of Electro-Comm.) |
Secretary | Masaaki Fujiyoshi(Utsunomiya Univ.) / Masatsugu Ichino(NICT) |
Assistant | Shoko Imaizumi(Chiba Univ.) / Youichi Takashima(Kaishi Professional Univ.) |
Paper Information | |
Registration To | Technical Committee on Enriched MultiMedia |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Successive Binary Partition k-Means Algorithm |
Sub Title (in English) | A clustering algorithm with less sample bias |
Keyword(1) | clustering |
Keyword(2) | k-means method |
Keyword(3) | balanced k-means method |
1st Author's Name | Akinori Ito |
1st Author's Affiliation | Tohoku University(Tohoku Univ.) |
Date | 2022-01-27 |
Paper # | EMM2021-90 |
Volume (vol) | vol.121 |
Number (no) | EMM-362 |
Page | pp.pp.37-42(EMM), |
#Pages | 6 |
Date of Issue | 2022-01-20 (EMM) |