The k-means Algorithm: A Comprehensive Survey and Performance Evaluation

The k-means Algorithm: A Comprehensive Survey and Performance Evaluation

12 August 2020 | Mohiuddin Ahmed, Raihan Seraj and Syed Mohammed Shamsul Islam
This paper provides a comprehensive survey and performance evaluation of the k-means clustering algorithm, focusing on its variants and their effectiveness in addressing key challenges such as initialization and handling mixed data types. The k-means algorithm is a popular unsupervised learning method used for clustering data, but it has limitations, including sensitivity to initial centroid placement and inability to handle non-numeric data. The paper reviews existing solutions to these problems, including various k-means variants that improve initialization and adapt to mixed data types. It also presents an experimental analysis of these variants on six benchmark datasets, evaluating their performance using metrics such as accuracy and Adjusted Rand Index (ARI). The results show that no single algorithm consistently outperforms others across all datasets, highlighting the importance of selecting the appropriate variant based on the specific characteristics of the data. The paper also discusses the computational complexity of different k-means variants, noting that some algorithms are more efficient for large datasets. Overall, the study emphasizes the need for further research to develop robust k-means algorithms that can effectively address both initialization and mixed data challenges, contributing to advancements in clustering techniques for big data applications.This paper provides a comprehensive survey and performance evaluation of the k-means clustering algorithm, focusing on its variants and their effectiveness in addressing key challenges such as initialization and handling mixed data types. The k-means algorithm is a popular unsupervised learning method used for clustering data, but it has limitations, including sensitivity to initial centroid placement and inability to handle non-numeric data. The paper reviews existing solutions to these problems, including various k-means variants that improve initialization and adapt to mixed data types. It also presents an experimental analysis of these variants on six benchmark datasets, evaluating their performance using metrics such as accuracy and Adjusted Rand Index (ARI). The results show that no single algorithm consistently outperforms others across all datasets, highlighting the importance of selecting the appropriate variant based on the specific characteristics of the data. The paper also discusses the computational complexity of different k-means variants, noting that some algorithms are more efficient for large datasets. Overall, the study emphasizes the need for further research to develop robust k-means algorithms that can effectively address both initialization and mixed data challenges, contributing to advancements in clustering techniques for big data applications.
Reach us at info@study.space
[slides and audio] The k-means Algorithm%3A A Comprehensive Survey and Performance Evaluation