August 29, 2003 | Ronald Jansen, Haiyuan Yu, Dov Greenbaum, Yuval Kluger, Nevan J Krogan, Sambath Chung, Andrew Emili, Michael Snyder, Jack F Greenblatt & Mark Gerstein
A Bayesian network approach was developed to predict protein-protein interactions (PPIs) in yeast using genomic data. The method integrates weakly associated genomic features, such as mRNA co-expression, co-essentiality, and co-localization, to generate reliable predictions. It also combines experimental datasets, which are often noisy, to improve accuracy. The approach uses Bayesian networks to probabilistically combine multiple datasets, allowing for the integration of diverse data types and handling of missing information. The method was validated with new TAP-tagging experiments and demonstrated higher accuracy than existing experimental datasets at given sensitivity levels.
The study used a gold-standard dataset of known protein complexes (MIPS complexes catalog) and synthesized negative gold-standard data from proteins in different subcellular compartments. The Bayesian network was trained on these datasets and used to predict interactions by calculating likelihood ratios based on the overlap of genomic features with the gold-standard. Predictions were made by comparing the likelihood ratio (L) to a threshold (L > L_cut), with L_cut set at 600 for high accuracy.
The method was applied to both existing experimental datasets (PIE) and de novo predictions (PIP) from genomic data. The PIE combined four high-throughput interaction datasets, while the PIP used genomic features like mRNA expression, biological function, and essentiality. The PIP showed higher sensitivity than the PIE, indicating better coverage of interactions. The results, termed probabilistic interactomes (PIs), provided a comprehensive view of yeast interactions, with PIP and PIE offering different levels of accuracy and coverage.
The study demonstrated that Bayesian networks outperformed voting procedures in integrating diverse data sources, especially when datasets were non-binary. The method was validated through TAP-tagging experiments, confirming several predicted interactions. The approach was also applied to other organisms, suggesting its potential for broader use in predicting PPIs. The Bayesian network's ability to handle different data formats and weights sources based on reliability made it a powerful tool for integrating genomic data into reliable PPI predictions.A Bayesian network approach was developed to predict protein-protein interactions (PPIs) in yeast using genomic data. The method integrates weakly associated genomic features, such as mRNA co-expression, co-essentiality, and co-localization, to generate reliable predictions. It also combines experimental datasets, which are often noisy, to improve accuracy. The approach uses Bayesian networks to probabilistically combine multiple datasets, allowing for the integration of diverse data types and handling of missing information. The method was validated with new TAP-tagging experiments and demonstrated higher accuracy than existing experimental datasets at given sensitivity levels.
The study used a gold-standard dataset of known protein complexes (MIPS complexes catalog) and synthesized negative gold-standard data from proteins in different subcellular compartments. The Bayesian network was trained on these datasets and used to predict interactions by calculating likelihood ratios based on the overlap of genomic features with the gold-standard. Predictions were made by comparing the likelihood ratio (L) to a threshold (L > L_cut), with L_cut set at 600 for high accuracy.
The method was applied to both existing experimental datasets (PIE) and de novo predictions (PIP) from genomic data. The PIE combined four high-throughput interaction datasets, while the PIP used genomic features like mRNA expression, biological function, and essentiality. The PIP showed higher sensitivity than the PIE, indicating better coverage of interactions. The results, termed probabilistic interactomes (PIs), provided a comprehensive view of yeast interactions, with PIP and PIE offering different levels of accuracy and coverage.
The study demonstrated that Bayesian networks outperformed voting procedures in integrating diverse data sources, especially when datasets were non-binary. The method was validated through TAP-tagging experiments, confirming several predicted interactions. The approach was also applied to other organisms, suggesting its potential for broader use in predicting PPIs. The Bayesian network's ability to handle different data formats and weights sources based on reliability made it a powerful tool for integrating genomic data into reliable PPI predictions.