13 March 2007 | Roded Sharan, Igor Ulitsky and Ron Shamir
This review summarizes current computational methods for predicting protein function based on protein interaction networks. The methods are divided into two categories: direct methods, which infer function based on network connections, and module-assisted methods, which identify functional modules and use them for annotation. Direct methods rely on the principle that proteins close in the network are more likely to have similar functions. Neighborhood counting methods assign functions based on the functions of neighboring proteins, while more advanced methods use statistical scores to account for network distance and functional similarity. Graph-theoretic methods, such as cut-based and flow-based approaches, consider the global structure of the network. Markov random field (MRF) models use probabilistic assumptions to infer function based on network topology and functional annotations.
Module-assisted methods first identify functional modules and then assign functions to proteins within those modules. These methods often use network topology, gene expression data, or other sources to detect modules. Several algorithms, such as MCODE, SPC, RNSC, and MCL, have been developed for module detection. These methods vary in their ability to detect overlapping modules and in their use of interaction reliability.
The review also discusses the integration of multiple data sources, such as gene expression and genetic interactions, to improve function prediction. Several studies have shown that combining different types of data can enhance the accuracy of functional annotations.
Performance comparisons of direct and module-assisted methods have shown that MRF-based methods often outperform others, while module-assisted methods like MCODE are effective in detecting dense subnetworks. However, the evaluation of these methods remains challenging due to the lack of standardized techniques for function prediction within modules.
The review concludes that while network-based methods have made significant progress, further systematic evaluation and integration of diverse data sources are needed to improve the accuracy and reliability of functional annotations. The availability of large-scale genomic data, such as gene expression and deletion phenotypes, offers new opportunities for improving function prediction. Overall, the field of network-based protein function prediction is rapidly evolving, with a growing number of methods and approaches being developed to enhance the accuracy and utility of functional annotations.This review summarizes current computational methods for predicting protein function based on protein interaction networks. The methods are divided into two categories: direct methods, which infer function based on network connections, and module-assisted methods, which identify functional modules and use them for annotation. Direct methods rely on the principle that proteins close in the network are more likely to have similar functions. Neighborhood counting methods assign functions based on the functions of neighboring proteins, while more advanced methods use statistical scores to account for network distance and functional similarity. Graph-theoretic methods, such as cut-based and flow-based approaches, consider the global structure of the network. Markov random field (MRF) models use probabilistic assumptions to infer function based on network topology and functional annotations.
Module-assisted methods first identify functional modules and then assign functions to proteins within those modules. These methods often use network topology, gene expression data, or other sources to detect modules. Several algorithms, such as MCODE, SPC, RNSC, and MCL, have been developed for module detection. These methods vary in their ability to detect overlapping modules and in their use of interaction reliability.
The review also discusses the integration of multiple data sources, such as gene expression and genetic interactions, to improve function prediction. Several studies have shown that combining different types of data can enhance the accuracy of functional annotations.
Performance comparisons of direct and module-assisted methods have shown that MRF-based methods often outperform others, while module-assisted methods like MCODE are effective in detecting dense subnetworks. However, the evaluation of these methods remains challenging due to the lack of standardized techniques for function prediction within modules.
The review concludes that while network-based methods have made significant progress, further systematic evaluation and integration of diverse data sources are needed to improve the accuracy and reliability of functional annotations. The availability of large-scale genomic data, such as gene expression and deletion phenotypes, offers new opportunities for improving function prediction. Overall, the field of network-based protein function prediction is rapidly evolving, with a growing number of methods and approaches being developed to enhance the accuracy and utility of functional annotations.