2013 January 12 | Tamás Nepusz1, Haiyuan Yu2, and Alberto Paccanaro1
The paper introduces ClusterONE, a method for detecting potentially overlapping protein complexes from protein-protein interaction (PPI) data. ClusterONE is designed to address the challenge of identifying dense regions in PPI networks, where proteins may have multiple functions and thus belong to more than one cluster. The method uses a quality measure called cohesiveness to guide the detection of protein complexes, which is defined as the ratio of the total weight of edges contained entirely by a group of proteins to the sum of the total weight of internal and boundary edges. The algorithm consists of three steps: greedy growth from seed vertices, merging highly overlapping groups, and discarding complexes with fewer than three proteins or low density. ClusterONE was tested on five large-scale yeast PPI datasets and compared with seven other popular methods. It outperformed these methods in terms of the fraction of matched complexes, geometric accuracy, and maximum matching ratio, a score based on a one-to-one mapping between predicted and reference complexes. The biological relevance of predicted complexes was assessed using co-localization scores and overrepresentation analysis of Gene Ontology annotations. ClusterONE is available as a free, user-friendly implementation, making it accessible for scientific research.The paper introduces ClusterONE, a method for detecting potentially overlapping protein complexes from protein-protein interaction (PPI) data. ClusterONE is designed to address the challenge of identifying dense regions in PPI networks, where proteins may have multiple functions and thus belong to more than one cluster. The method uses a quality measure called cohesiveness to guide the detection of protein complexes, which is defined as the ratio of the total weight of edges contained entirely by a group of proteins to the sum of the total weight of internal and boundary edges. The algorithm consists of three steps: greedy growth from seed vertices, merging highly overlapping groups, and discarding complexes with fewer than three proteins or low density. ClusterONE was tested on five large-scale yeast PPI datasets and compared with seven other popular methods. It outperformed these methods in terms of the fraction of matched complexes, geometric accuracy, and maximum matching ratio, a score based on a one-to-one mapping between predicted and reference complexes. The biological relevance of predicted complexes was assessed using co-localization scores and overrepresentation analysis of Gene Ontology annotations. ClusterONE is available as a free, user-friendly implementation, making it accessible for scientific research.