Fast Algorithms for Projected Clustering

Fast Algorithms for Projected Clustering

1999 | Charu C. Aggarwal, Cecilia Procopiuc, Joel L. Wolf, Philip S. Yu, Jong Soo Park
The paper introduces a new clustering method called projected clustering, which allows for finding clusters in subspaces of high-dimensional data. Traditional clustering methods often fail in high-dimensional spaces due to sparsity, but projected clustering addresses this by considering different subsets of dimensions for different clusters. The algorithm, named PROCLUS, finds clusters in small projected subspaces and associates each cluster with specific dimensions. It is designed to handle varying numbers of dimensions per cluster and provides a partition of data points into clusters along with the relevant dimensions for each cluster. The algorithm is tested on synthetic data and compared with CLIQUE, a density-based clustering method. PROCLUS outperforms CLIQUE in terms of accuracy and scalability, especially in high-dimensional data. It is efficient and robust, with running time that is only slightly influenced by the average cluster dimensionality. The method is particularly useful for applications requiring accurate clustering and understanding of relevant dimensions. The paper also discusses the theoretical analysis of the algorithm's robustness and presents empirical results showing its effectiveness in various scenarios.The paper introduces a new clustering method called projected clustering, which allows for finding clusters in subspaces of high-dimensional data. Traditional clustering methods often fail in high-dimensional spaces due to sparsity, but projected clustering addresses this by considering different subsets of dimensions for different clusters. The algorithm, named PROCLUS, finds clusters in small projected subspaces and associates each cluster with specific dimensions. It is designed to handle varying numbers of dimensions per cluster and provides a partition of data points into clusters along with the relevant dimensions for each cluster. The algorithm is tested on synthetic data and compared with CLIQUE, a density-based clustering method. PROCLUS outperforms CLIQUE in terms of accuracy and scalability, especially in high-dimensional data. It is efficient and robust, with running time that is only slightly influenced by the average cluster dimensionality. The method is particularly useful for applications requiring accurate clustering and understanding of relevant dimensions. The paper also discusses the theoretical analysis of the algorithm's robustness and presents empirical results showing its effectiveness in various scenarios.
Reach us at info@study.space
[slides and audio] Fast algorithms for projected clustering