Diffusion Language Models Are Versatile Protein Learners

Diffusion Language Models Are Versatile Protein Learners

28 Feb 2024 | Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, Quanquan Gu
This paper introduces DPLM, a versatile protein language model that demonstrates strong generative and predictive capabilities for protein sequences. DPLM is pre-trained using a discrete diffusion probabilistic framework, enabling it to generate structurally plausible, novel, and diverse protein sequences. The model excels in understanding proteins, making it a superior representation learner that can be fine-tuned for various predictive tasks, outperforming ESM2. DPLM also supports conditional generation in multiple ways, including sequence conditioning, cross-modal conditioning, and plug-and-play classifier guidance for controlling sequence generation. The model is capable of generating protein sequences with desired properties, such as specific secondary structures. DPLM is evaluated on various tasks, including unconditional generation, protein predictive tasks, and conditional generation. It outperforms existing models in terms of foldability, novelty, diversity, and learning performance. The results show that DPLM is a powerful tool for protein sequence generation and representation learning, with the potential for broader applications in protein research.This paper introduces DPLM, a versatile protein language model that demonstrates strong generative and predictive capabilities for protein sequences. DPLM is pre-trained using a discrete diffusion probabilistic framework, enabling it to generate structurally plausible, novel, and diverse protein sequences. The model excels in understanding proteins, making it a superior representation learner that can be fine-tuned for various predictive tasks, outperforming ESM2. DPLM also supports conditional generation in multiple ways, including sequence conditioning, cross-modal conditioning, and plug-and-play classifier guidance for controlling sequence generation. The model is capable of generating protein sequences with desired properties, such as specific secondary structures. DPLM is evaluated on various tasks, including unconditional generation, protein predictive tasks, and conditional generation. It outperforms existing models in terms of foldability, novelty, diversity, and learning performance. The results show that DPLM is a powerful tool for protein sequence generation and representation learning, with the potential for broader applications in protein research.
Reach us at info@study.space