[slides] Fast algorithms for projected clustering

The paper introduces the concept of projected clustering, which is a generalization of feature selection that allows selecting different sets of dimensions for different subsets of data points. Traditional feature selection algorithms often fail in high-dimensional spaces because they assume a single set of relevant dimensions for all clusters, which is not always the case. The authors propose an algorithm called PROCLUS to address this issue. PROCLUS aims to find clusters in small projected subspaces by selecting specific dimensions for each cluster. The algorithm consists of three phases: initialization, iterative, and refinement. It uses a greedy method to find a superset of a piercing set of medoids and then performs hill climbing to improve the quality of the medoids. The algorithm also determines the appropriate set of dimensions for each medoid based on the locality of the space near the medoids. The paper includes theoretical analysis and empirical results on synthetic data to demonstrate the effectiveness and scalability of PROCLUS compared to the CLIQUE algorithm, which is another method for finding dense regions in high-dimensional data.The paper introduces the concept of projected clustering, which is a generalization of feature selection that allows selecting different sets of dimensions for different subsets of data points. Traditional feature selection algorithms often fail in high-dimensional spaces because they assume a single set of relevant dimensions for all clusters, which is not always the case. The authors propose an algorithm called PROCLUS to address this issue. PROCLUS aims to find clusters in small projected subspaces by selecting specific dimensions for each cluster. The algorithm consists of three phases: initialization, iterative, and refinement. It uses a greedy method to find a superset of a piercing set of medoids and then performs hill climbing to improve the quality of the medoids. The algorithm also determines the appropriate set of dimensions for each medoid based on the locality of the space near the medoids. The paper includes theoretical analysis and empirical results on synthetic data to demonstrate the effectiveness and scalability of PROCLUS compared to the CLIQUE algorithm, which is another method for finding dense regions in high-dimensional data.

Fast Algorithms for Projected Clustering

1999 | Charu C. Aggarwal, Cecilia Procopiuc, Joel L. Wolf, Philip S. Yu, Jong Soo Park