Evaluation of clustering algorithms for protein-protein interaction networks

Evaluation of clustering algorithms for protein-protein interaction networks

06 November 2006 | Sylvain Brohée* and Jacques van Helden
This study evaluates four clustering algorithms—Markov Clustering (MCL), Restricted Neighborhood Search Clustering (RNSC), Super Paramagnetic Clustering (SPC), and Molecular Complex Detection (MCODE)—for their ability to identify protein complexes in protein-protein interaction networks. The algorithms were tested on a graph constructed from 220 annotated protein complexes in the MIPS database. To assess robustness, 41 altered graphs were generated by randomly adding or removing edges. Each algorithm was applied to these graphs with various parameter settings, and the resulting clusters were compared with the annotated complexes. The study found that MCL is remarkably robust to graph alterations, while RNSC is more sensitive to edge deletion but less sensitive to suboptimal parameter values. The other two algorithms (SPC and MCODE) performed less well under most conditions. The analysis of high-throughput data supports the superiority of MCL for extracting protein complexes from interaction networks. The study also introduced new matching statistics called "separation" to evaluate the performance of clustering algorithms. Separation measures the bidirectional correspondence between clusters and complexes, providing a more accurate assessment than traditional metrics like sensitivity and PPV. The results showed that MCL outperformed the other algorithms in terms of both accuracy and separation, particularly in the presence of noise and missing data. The study highlights the importance of parameter optimization for clustering algorithms and demonstrates that MCL is the most reliable method for identifying protein complexes in interaction networks. The findings suggest that MCL is particularly well-suited for high-throughput data analysis, where the presence of noise and missing interactions is common. The study also emphasizes the need for careful evaluation of clustering algorithms to ensure their reliability in biological applications.This study evaluates four clustering algorithms—Markov Clustering (MCL), Restricted Neighborhood Search Clustering (RNSC), Super Paramagnetic Clustering (SPC), and Molecular Complex Detection (MCODE)—for their ability to identify protein complexes in protein-protein interaction networks. The algorithms were tested on a graph constructed from 220 annotated protein complexes in the MIPS database. To assess robustness, 41 altered graphs were generated by randomly adding or removing edges. Each algorithm was applied to these graphs with various parameter settings, and the resulting clusters were compared with the annotated complexes. The study found that MCL is remarkably robust to graph alterations, while RNSC is more sensitive to edge deletion but less sensitive to suboptimal parameter values. The other two algorithms (SPC and MCODE) performed less well under most conditions. The analysis of high-throughput data supports the superiority of MCL for extracting protein complexes from interaction networks. The study also introduced new matching statistics called "separation" to evaluate the performance of clustering algorithms. Separation measures the bidirectional correspondence between clusters and complexes, providing a more accurate assessment than traditional metrics like sensitivity and PPV. The results showed that MCL outperformed the other algorithms in terms of both accuracy and separation, particularly in the presence of noise and missing data. The study highlights the importance of parameter optimization for clustering algorithms and demonstrates that MCL is the most reliable method for identifying protein complexes in interaction networks. The findings suggest that MCL is particularly well-suited for high-throughput data analysis, where the presence of noise and missing interactions is common. The study also emphasizes the need for careful evaluation of clustering algorithms to ensure their reliability in biological applications.
Reach us at info@study.space