Highly accurate protein structure prediction for the human proteome

Highly accurate protein structure prediction for the human proteome

26 August 2021 | Kathryn Tunyasuvunakool, Jonas Adler, Zachary Wu, Tim Green, Michal Zielinski, Augustin Žídek, Alex Bridgland, Andrew Cowie, Clemens Meyer, Agata Laydon, Sameer Velankar, Gerard J. Kleywegt, Alex Bateman, Richard Evans, Alexander Pritzel, Michael Figurnov, Olaf Ronneberger, Russ Bates, Simon A. A. Kohl, Anna Potapenko, Andrew J. Ballard, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Ellen Clancy, David Reiman, Stig Petersen, Andrew W. Senior, Koray Kavukcuoglu, Ewan Birney, Pushmeet Kohli, John Jumper & Demis Hassabis
This study presents a highly accurate protein structure prediction for the human proteome using the AlphaFold method, achieving coverage of 98.5% of human proteins. The resulting dataset includes confident predictions for 58% of residues, with 36% having very high confidence. The study introduces metrics like pLDDT and pTM to assess prediction confidence and identify regions likely to be disordered. The predictions are freely available for the scientific community and are expected to enable new biological hypotheses and structural studies. The human genome project revealed a vast number of protein-coding genes, leading to extensive structural genomics efforts. Despite these efforts, only 35% of human proteins have experimentally determined structures, often covering only fragments of the sequence. Experimental structure determination is challenging due to various factors, including protein production, purification, and conformational changes. AlphaFold's success in predicting protein structures has significantly improved the accuracy and scale of structure prediction. The study demonstrates that AlphaFold can predict full-length protein chains with high accuracy, even for challenging proteins. The results show that AlphaFold outperforms other methods in predicting multi-domain proteins and provides accurate predictions for a wide range of protein classes, including membrane proteins. The study highlights several case studies, including glucose-6-phosphatase, diacylglycerol O-acyltransferase 2, and wolframin, where high-quality predictions have provided insights into their structures and functions. These predictions have potential applications in drug development and understanding protein interactions. The study also addresses the challenge of predicting structures for disordered regions, which are common in eukaryotic proteomes. AlphaFold's predictions for these regions are validated using disorder prediction metrics, showing that they can accurately identify disordered residues. The results suggest that a significant portion of low-confidence residues may be due to intrinsic disorder. The study emphasizes the importance of structural biology in understanding protein function and disease. By providing a comprehensive set of high-accuracy predictions, AlphaFold enables new research directions in structural bioinformatics. The results highlight the potential of AlphaFold to transform the study of proteins, particularly for organisms with limited experimental structures. The availability of these predictions to the scientific community is expected to accelerate discoveries in structural biology and related fields.This study presents a highly accurate protein structure prediction for the human proteome using the AlphaFold method, achieving coverage of 98.5% of human proteins. The resulting dataset includes confident predictions for 58% of residues, with 36% having very high confidence. The study introduces metrics like pLDDT and pTM to assess prediction confidence and identify regions likely to be disordered. The predictions are freely available for the scientific community and are expected to enable new biological hypotheses and structural studies. The human genome project revealed a vast number of protein-coding genes, leading to extensive structural genomics efforts. Despite these efforts, only 35% of human proteins have experimentally determined structures, often covering only fragments of the sequence. Experimental structure determination is challenging due to various factors, including protein production, purification, and conformational changes. AlphaFold's success in predicting protein structures has significantly improved the accuracy and scale of structure prediction. The study demonstrates that AlphaFold can predict full-length protein chains with high accuracy, even for challenging proteins. The results show that AlphaFold outperforms other methods in predicting multi-domain proteins and provides accurate predictions for a wide range of protein classes, including membrane proteins. The study highlights several case studies, including glucose-6-phosphatase, diacylglycerol O-acyltransferase 2, and wolframin, where high-quality predictions have provided insights into their structures and functions. These predictions have potential applications in drug development and understanding protein interactions. The study also addresses the challenge of predicting structures for disordered regions, which are common in eukaryotic proteomes. AlphaFold's predictions for these regions are validated using disorder prediction metrics, showing that they can accurately identify disordered residues. The results suggest that a significant portion of low-confidence residues may be due to intrinsic disorder. The study emphasizes the importance of structural biology in understanding protein function and disease. By providing a comprehensive set of high-accuracy predictions, AlphaFold enables new research directions in structural bioinformatics. The results highlight the potential of AlphaFold to transform the study of proteins, particularly for organisms with limited experimental structures. The availability of these predictions to the scientific community is expected to accelerate discoveries in structural biology and related fields.
Reach us at info@study.space