Evolutionary-scale prediction of atomic level protein structure with a language model

Evolutionary-scale prediction of atomic level protein structure with a language model

December 21, 2022 | Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Salvatore Candido, Alexander Rives
This paper presents a novel approach to predicting atomic-level protein structures using large language models. By leveraging the evolutionary patterns learned by these models, the authors achieve a significant speed-up in high-resolution structure prediction while maintaining accuracy. The study introduces ESM-2, a language model trained on protein sequences, which can predict atomic structures with a TM-score of 0.72 on the CAMEO test set and 0.55 on the CASP14 test set. This model is used to develop ESMFold, an end-to-end single sequence structure predictor that is 60 times faster than state-of-the-art methods. The authors also present the ESM Metagenomic Atlas, a large-scale structural characterization of metagenomic proteins, revealing over 225 million high-confidence predictions, many of which are novel compared to experimentally determined structures. This work demonstrates the potential of language models to accelerate and expand protein structure prediction, providing new insights into the vast diversity of proteins.This paper presents a novel approach to predicting atomic-level protein structures using large language models. By leveraging the evolutionary patterns learned by these models, the authors achieve a significant speed-up in high-resolution structure prediction while maintaining accuracy. The study introduces ESM-2, a language model trained on protein sequences, which can predict atomic structures with a TM-score of 0.72 on the CAMEO test set and 0.55 on the CASP14 test set. This model is used to develop ESMFold, an end-to-end single sequence structure predictor that is 60 times faster than state-of-the-art methods. The authors also present the ESM Metagenomic Atlas, a large-scale structural characterization of metagenomic proteins, revealing over 225 million high-confidence predictions, many of which are novel compared to experimentally determined structures. This work demonstrates the potential of language models to accelerate and expand protein structure prediction, providing new insights into the vast diversity of proteins.
Reach us at info@study.space