Principal component analysis for clustering gene expression data

Principal component analysis for clustering gene expression data

Vol. 17 no. 9 2001 Pages 763–774 | K. Y. Yeung* and W. L. Ruzzo
The paper by K. Y. Yeung and W. L. Ruzzo investigates the effectiveness of using principal components (PCs) in capturing cluster structure when analyzing gene expression data. They compare the quality of clusters obtained from the original data with those obtained after projecting the data onto subsets of principal component axes. The study uses both real and synthetic gene expression datasets to evaluate the impact of PCA on clustering performance. Key findings include: - Clustering with PCs does not necessarily improve cluster quality and often degrades it. - The first few PCs, which capture most of the variation in the data, do not necessarily capture the most cluster structure. - Different clustering algorithms and similarity metrics have varying impacts on the effectiveness of PCA. - There is no clear trend in the choice of optimal number of PCs across different datasets, algorithms, and metrics. - Clustering random sets of PCs tends to yield slightly lower cluster quality compared to random orthogonal projections. The authors recommend against using PCA before clustering unless external information is available, as it may not enhance cluster quality. They also suggest that choosing an appropriate clustering algorithm is as important as selecting the 'appropriate' PCs.The paper by K. Y. Yeung and W. L. Ruzzo investigates the effectiveness of using principal components (PCs) in capturing cluster structure when analyzing gene expression data. They compare the quality of clusters obtained from the original data with those obtained after projecting the data onto subsets of principal component axes. The study uses both real and synthetic gene expression datasets to evaluate the impact of PCA on clustering performance. Key findings include: - Clustering with PCs does not necessarily improve cluster quality and often degrades it. - The first few PCs, which capture most of the variation in the data, do not necessarily capture the most cluster structure. - Different clustering algorithms and similarity metrics have varying impacts on the effectiveness of PCA. - There is no clear trend in the choice of optimal number of PCs across different datasets, algorithms, and metrics. - Clustering random sets of PCs tends to yield slightly lower cluster quality compared to random orthogonal projections. The authors recommend against using PCA before clustering unless external information is available, as it may not enhance cluster quality. They also suggest that choosing an appropriate clustering algorithm is as important as selecting the 'appropriate' PCs.
Reach us at info@study.space