2024 | Peng Cheng, Cong Mao, Jin Tang, Sen Yang, Yu Cheng, Wuke Wang, Qiuxi Gu, Wei Han, Hao Chen, Sihan Li, Yaofeng Chen, Jianglin Zhou, Wuji Li, Aimin Pan, Suwen Zhao, Xingxu Huang, Shiqiang Zhu, Jun Zhang, Wenjie Shu, and Shengqi Wang
ProMEP is a multimodal deep representation learning model that enables zero-shot prediction of mutation effects on proteins. It integrates both sequence and structure contexts from approximately 160 million proteins to achieve state-of-the-art performance in predicting mutational effects. ProMEP is significantly faster than AlphaMissense and performs well for proteins where multiple sequence alignments (MSAs) are unavailable. It accurately predicts the effects of mutations on gene-editing enzymes TnpB and TadA, and successfully guides the development of high-performance gene-editing tools. For example, a 5-site mutant of TnpB achieved an editing efficiency of 74.04%, and a TadA 15-site mutant-based base editing tool achieved an A-to-G conversion frequency of 77.27% with reduced off-target effects. ProMEP also demonstrates superior performance in capturing sequence and structure contexts, enabling accurate prediction of functional sites and secondary structure. It shows high accuracy in predicting mutation effects for proteins with low homology and de novo designed proteins. ProMEP is an MSA-free method that can be used to predict mutation effects for any protein with an available amino acid sequence. It outperforms other methods in predicting mutation effects, especially for proteins where MSAs are unavailable. ProMEP is also significantly faster than AlphaMissense, making it a powerful tool for protein engineering. ProMEP has been successfully applied to guide the engineering of TnpB and TadA, resulting in improved editing efficiency and reduced off-target effects. Overall, ProMEP provides a powerful and efficient method for predicting mutation effects and guiding protein engineering.ProMEP is a multimodal deep representation learning model that enables zero-shot prediction of mutation effects on proteins. It integrates both sequence and structure contexts from approximately 160 million proteins to achieve state-of-the-art performance in predicting mutational effects. ProMEP is significantly faster than AlphaMissense and performs well for proteins where multiple sequence alignments (MSAs) are unavailable. It accurately predicts the effects of mutations on gene-editing enzymes TnpB and TadA, and successfully guides the development of high-performance gene-editing tools. For example, a 5-site mutant of TnpB achieved an editing efficiency of 74.04%, and a TadA 15-site mutant-based base editing tool achieved an A-to-G conversion frequency of 77.27% with reduced off-target effects. ProMEP also demonstrates superior performance in capturing sequence and structure contexts, enabling accurate prediction of functional sites and secondary structure. It shows high accuracy in predicting mutation effects for proteins with low homology and de novo designed proteins. ProMEP is an MSA-free method that can be used to predict mutation effects for any protein with an available amino acid sequence. It outperforms other methods in predicting mutation effects, especially for proteins where MSAs are unavailable. ProMEP is also significantly faster than AlphaMissense, making it a powerful tool for protein engineering. ProMEP has been successfully applied to guide the engineering of TnpB and TadA, resulting in improved editing efficiency and reduced off-target effects. Overall, ProMEP provides a powerful and efficient method for predicting mutation effects and guiding protein engineering.