14 Jan 2024 | Hui Yin, Amir Aryani, Stephen Petrie, Aishwarya Nambissan, Aland Astudillo, Shengyuan Cao
This paper provides a comprehensive review of clustering algorithms, aiming to organize data into groups based on inherent patterns and similarities. Clustering algorithms are crucial in various fields such as marketing, healthcare, and data analysis. The authors classify mainstream clustering algorithms across five dimensions: underlying principles and characteristics, data point assignment to clusters, dataset capacity, predefined cluster numbers, and application areas. This classification helps researchers understand and choose suitable algorithms for specific tasks. The paper also discusses current trends, future directions, and open challenges in clustering algorithms. It highlights the importance of adapting algorithms to handle different types of data and complex structures, such as high-dimensional and large-scale datasets. The review covers various clustering methods, including partition-based, hierarchical, density-based, grid-based, and model-based clustering, and provides detailed comparisons and evaluations using internal and external metrics. The discussion section emphasizes the shift in research focus towards targeted applications and the integration of deep learning technologies. The primary challenge in clustering is determining the optimal number of clusters, and the paper offers insights into various methods for this task. Overall, the review aims to provide a comprehensive guide for selecting and applying clustering algorithms effectively.This paper provides a comprehensive review of clustering algorithms, aiming to organize data into groups based on inherent patterns and similarities. Clustering algorithms are crucial in various fields such as marketing, healthcare, and data analysis. The authors classify mainstream clustering algorithms across five dimensions: underlying principles and characteristics, data point assignment to clusters, dataset capacity, predefined cluster numbers, and application areas. This classification helps researchers understand and choose suitable algorithms for specific tasks. The paper also discusses current trends, future directions, and open challenges in clustering algorithms. It highlights the importance of adapting algorithms to handle different types of data and complex structures, such as high-dimensional and large-scale datasets. The review covers various clustering methods, including partition-based, hierarchical, density-based, grid-based, and model-based clustering, and provides detailed comparisons and evaluations using internal and external metrics. The discussion section emphasizes the shift in research focus towards targeted applications and the integration of deep learning technologies. The primary challenge in clustering is determining the optimal number of clusters, and the paper offers insights into various methods for this task. Overall, the review aims to provide a comprehensive guide for selecting and applying clustering algorithms effectively.