Predictomes: A classifier-curated database of AlphaFold-modeled protein-protein interactions

Predictomes: A classifier-curated database of AlphaFold-modeled protein-protein interactions

April 12, 2024 | Ernst W. Schmid, Johannes C. Walter
Predictomes is a classifier-curated database of AlphaFold-modeled protein-protein interactions. The study addresses the challenge of distinguishing true from false protein-protein interactions (PPIs) using AlphaFold-Multimer (AF-M), which has the potential to fill knowledge gaps in structural characterization of PPIs. However, standard AF-M confidence metrics are not reliable for separating true from false interactions. To address this, the researchers trained a Structure Prediction and Omics informed Classifier (SPOC) using well-curated datasets, which outperforms standard metrics in separating true and false PPIs, including in proteome-wide screens. SPOC was applied to an all-by-all matrix of nearly 300 human genome maintenance proteins, generating ~40,000 predictions available at predictomes.org. These predictions can be scored using SPOC, and high confidence PPIs suggest novel hypotheses in genome maintenance. The results provide a framework for interpreting large-scale AF-M screens and help lay the foundation for a proteome-wide structural interactome. The study highlights the limitations of existing metrics in evaluating large-scale AF-M screens and introduces SPOC, a machine learning classifier that effectively distinguishes true from false interactions. SPOC was trained on curated datasets of true and false interactions, incorporating structural and biological features of protein pairs. It outperforms existing metrics in separating true and false interactions, even in proteome-wide screens. SPOC was applied to an all-by-all interaction matrix of 285 human genome maintenance proteins, leading to the identification of many novel, high confidence predictions. These predictions can be viewed and downloaded at predictomes.org, where users can also obtain a SPOC score for their own predictions. The study also evaluates SPOC's performance in biological discovery, showing that it outperforms other metrics in ranking functional interactions ahead of spurious ones. SPOC was tested on a proteome-wide screen for DONSON interactors, where it placed DONSON's functional partners in the top 7 hits out of more than 20,000 pairs. This demonstrates SPOC's ability to discover PPIs ab initio in proteome-wide in silico screens. The study further explores the application of SPOC in genome maintenance, where it identifies many high confidence PPIs that are supported by biochemical or genetic evidence. These findings suggest that SPOC is a powerful tool for detecting meaningful interactions and generating hypotheses in genome maintenance. The study also introduces a web portal for AlphaFold multimer predictions, predictomes.org, which allows researchers to interact with the genome maintenance structure prediction data. The portal provides an interactive matrix and list of protein pairs, along with tools for visualizing and ranking structure predictions. The study concludes that SPOC is a valuable resource for biologists to leverage the structure prediction revolution for their research, providing a framework for interpreting large-scale AF-M screens and helping layPredictomes is a classifier-curated database of AlphaFold-modeled protein-protein interactions. The study addresses the challenge of distinguishing true from false protein-protein interactions (PPIs) using AlphaFold-Multimer (AF-M), which has the potential to fill knowledge gaps in structural characterization of PPIs. However, standard AF-M confidence metrics are not reliable for separating true from false interactions. To address this, the researchers trained a Structure Prediction and Omics informed Classifier (SPOC) using well-curated datasets, which outperforms standard metrics in separating true and false PPIs, including in proteome-wide screens. SPOC was applied to an all-by-all matrix of nearly 300 human genome maintenance proteins, generating ~40,000 predictions available at predictomes.org. These predictions can be scored using SPOC, and high confidence PPIs suggest novel hypotheses in genome maintenance. The results provide a framework for interpreting large-scale AF-M screens and help lay the foundation for a proteome-wide structural interactome. The study highlights the limitations of existing metrics in evaluating large-scale AF-M screens and introduces SPOC, a machine learning classifier that effectively distinguishes true from false interactions. SPOC was trained on curated datasets of true and false interactions, incorporating structural and biological features of protein pairs. It outperforms existing metrics in separating true and false interactions, even in proteome-wide screens. SPOC was applied to an all-by-all interaction matrix of 285 human genome maintenance proteins, leading to the identification of many novel, high confidence predictions. These predictions can be viewed and downloaded at predictomes.org, where users can also obtain a SPOC score for their own predictions. The study also evaluates SPOC's performance in biological discovery, showing that it outperforms other metrics in ranking functional interactions ahead of spurious ones. SPOC was tested on a proteome-wide screen for DONSON interactors, where it placed DONSON's functional partners in the top 7 hits out of more than 20,000 pairs. This demonstrates SPOC's ability to discover PPIs ab initio in proteome-wide in silico screens. The study further explores the application of SPOC in genome maintenance, where it identifies many high confidence PPIs that are supported by biochemical or genetic evidence. These findings suggest that SPOC is a powerful tool for detecting meaningful interactions and generating hypotheses in genome maintenance. The study also introduces a web portal for AlphaFold multimer predictions, predictomes.org, which allows researchers to interact with the genome maintenance structure prediction data. The portal provides an interactive matrix and list of protein pairs, along with tools for visualizing and ranking structure predictions. The study concludes that SPOC is a valuable resource for biologists to leverage the structure prediction revolution for their research, providing a framework for interpreting large-scale AF-M screens and helping lay
Reach us at info@study.space