This chapter discusses various clustering methods and algorithms used in data mining. Clustering has been studied for over 40 years and is widely applied across many disciplines. Several textbooks and survey articles have been published on the topic, including works by Hartigan, Jain and Dubes, Kaufman and Rousseeuw, and Arabie, Hubert, and De Sorte.
Partitioning methods include the k-means algorithm, introduced by Lloyd and MacQueen, and the k-medoids algorithms PAM and CLARA by Kaufman and Rousseeuw. The k-modes and k-prototypes algorithms for categorical and hybrid data were proposed by Huang. The CLARANS algorithm was introduced by Ng and Han, and techniques for improving its performance were developed by Ester, Kriegel, and Xu.
Agglomerative and divisive hierarchical clustering methods were introduced by Kaufman and Rousseeuw. BIRCH, developed by Zhang, Ramakrishnan, and Livny, uses a CF-tree for hierarchical clustering. Other hierarchical clustering methods include CURE, ROCK, and Chameleon.
Density-based clustering methods include DBSCAN and OPTICS. Grid-based methods like STING and wavelet-based methods like WaveCluster are also discussed. Model-based clustering includes the EM algorithm and AutoClass. Conceptual clustering methods include COBWEB and CLASSIT. Neural network approaches include SOM and competitive learning.
Scalable methods for categorical data include CLIQUE, PROCLUS, and EN-CLUS. High-dimensional clustering methods include CLIQUE and PROCLUS. Clustering stream data methods include k-median-based algorithms and methods for evolving data streams. Constraint-based clustering methods include CLTree and methods for spatial clustering with obstacles.
Outlier detection methods include statistical, distance-based, density-based, and deviation-based approaches. The chapter also covers various references and citations of key works in the field of clustering.This chapter discusses various clustering methods and algorithms used in data mining. Clustering has been studied for over 40 years and is widely applied across many disciplines. Several textbooks and survey articles have been published on the topic, including works by Hartigan, Jain and Dubes, Kaufman and Rousseeuw, and Arabie, Hubert, and De Sorte.
Partitioning methods include the k-means algorithm, introduced by Lloyd and MacQueen, and the k-medoids algorithms PAM and CLARA by Kaufman and Rousseeuw. The k-modes and k-prototypes algorithms for categorical and hybrid data were proposed by Huang. The CLARANS algorithm was introduced by Ng and Han, and techniques for improving its performance were developed by Ester, Kriegel, and Xu.
Agglomerative and divisive hierarchical clustering methods were introduced by Kaufman and Rousseeuw. BIRCH, developed by Zhang, Ramakrishnan, and Livny, uses a CF-tree for hierarchical clustering. Other hierarchical clustering methods include CURE, ROCK, and Chameleon.
Density-based clustering methods include DBSCAN and OPTICS. Grid-based methods like STING and wavelet-based methods like WaveCluster are also discussed. Model-based clustering includes the EM algorithm and AutoClass. Conceptual clustering methods include COBWEB and CLASSIT. Neural network approaches include SOM and competitive learning.
Scalable methods for categorical data include CLIQUE, PROCLUS, and EN-CLUS. High-dimensional clustering methods include CLIQUE and PROCLUS. Clustering stream data methods include k-median-based algorithms and methods for evolving data streams. Constraint-based clustering methods include CLTree and methods for spatial clustering with obstacles.
Outlier detection methods include statistical, distance-based, density-based, and deviation-based approaches. The chapter also covers various references and citations of key works in the field of clustering.