[slides] M%3A Cluster Analysis

The chapter provides a comprehensive overview of cluster analysis, a field that has been extensively studied for over 40 years across various disciplines. It highlights the broad applications of clustering methods and their integration into pattern classification and machine learning. The text references several key textbooks and survey articles on clustering, including works by Hartigan, Jain and Dubes, Kaufman and Rousseeuw, and Arabie, Hubert, and De Sorte. The chapter details various clustering algorithms and techniques, such as the $k$-means algorithm, $k$-medoids, $k$-modes, and $k$-prototypes for categorical and hybrid data. It also discusses hierarchical clustering methods like AGNES and DIANA, and density-based methods like DBSCAN and OPTICS. Grid-based and model-based clustering approaches, including STING and EM, are covered, along with their extensions and applications. The text further explores scalable methods for categorical data, high-dimensional clustering, and streaming data. It mentions the use of constraints in unsupervised clustering, semi-supervised clustering, and outlier detection techniques. Recent advancements in clustering for evolving data streams and high-dimensional data are also discussed, along with frameworks for constraint-based and spatial clustering. Finally, the chapter references specific algorithms and methods, such as the Cure algorithm for large databases and the CLTree method for transforming clustering into a classification problem.The chapter provides a comprehensive overview of cluster analysis, a field that has been extensively studied for over 40 years across various disciplines. It highlights the broad applications of clustering methods and their integration into pattern classification and machine learning. The text references several key textbooks and survey articles on clustering, including works by Hartigan, Jain and Dubes, Kaufman and Rousseeuw, and Arabie, Hubert, and De Sorte. The chapter details various clustering algorithms and techniques, such as the $k$-means algorithm, $k$-medoids, $k$-modes, and $k$-prototypes for categorical and hybrid data. It also discusses hierarchical clustering methods like AGNES and DIANA, and density-based methods like DBSCAN and OPTICS. Grid-based and model-based clustering approaches, including STING and EM, are covered, along with their extensions and applications. The text further explores scalable methods for categorical data, high-dimensional clustering, and streaming data. It mentions the use of constraints in unsupervised clustering, semi-supervised clustering, and outlier detection techniques. Recent advancements in clustering for evolving data streams and high-dimensional data are also discussed, along with frameworks for constraint-based and spatial clustering. Finally, the chapter references specific algorithms and methods, such as the Cure algorithm for large databases and the CLTree method for transforming clustering into a classification problem.

Data Mining: Concepts and Techniques (2nd edition)

2006 | Jiawei Han and Micheline Kamber