21 Mar 2024 | Yan Wang, Lihao Wang, Yuning Shen, Yiqun Wang, Huizhuo Yuan, Yue Wu, Quanquan Gu
This paper introduces CONFDIFF, a force-guided SE(3) diffusion model for protein conformation generation. The method addresses the limitations of existing score-based diffusion models in incorporating physical prior knowledge to guide the generation process, leading to deviations from the equilibrium distribution. CONFDIFF integrates a force-guided network with a mixture of data-based score models, enabling the generation of protein conformations with rich diversity while preserving high fidelity. The model is trained on general protein structures from the PDB and self-generated conformation samples, without relying on MD simulation data.
The key contributions of this work include: (1) employing a sequence-conditional model to guide the unconditional model, achieving a better trade-off between conformation quality and diversity; (2) utilizing the MD energy function as a physics-based reward to guide the generation of protein conformations, along with an intermediate force guidance strategy during the diffusion sampling process; and (3) demonstrating that the method outperforms state-of-the-art approaches in various protein conformation prediction tasks, including 12 fast-folding proteins and BPTI.
Experiments show that CONFDIFF generates conformations with lower energy and better compliance with the Boltzmann distribution. The model's performance is evaluated on benchmark datasets, including fast-folding proteins and BPTI, with results indicating superior accuracy and diversity compared to existing methods. The force-guided approach effectively improves conformation stability without significantly reducing diversity, and the model excels in predicting metastable states of BPTI. The study highlights the potential of integrating physical guidance into diffusion models for more accurate and biologically relevant protein conformation generation.This paper introduces CONFDIFF, a force-guided SE(3) diffusion model for protein conformation generation. The method addresses the limitations of existing score-based diffusion models in incorporating physical prior knowledge to guide the generation process, leading to deviations from the equilibrium distribution. CONFDIFF integrates a force-guided network with a mixture of data-based score models, enabling the generation of protein conformations with rich diversity while preserving high fidelity. The model is trained on general protein structures from the PDB and self-generated conformation samples, without relying on MD simulation data.
The key contributions of this work include: (1) employing a sequence-conditional model to guide the unconditional model, achieving a better trade-off between conformation quality and diversity; (2) utilizing the MD energy function as a physics-based reward to guide the generation of protein conformations, along with an intermediate force guidance strategy during the diffusion sampling process; and (3) demonstrating that the method outperforms state-of-the-art approaches in various protein conformation prediction tasks, including 12 fast-folding proteins and BPTI.
Experiments show that CONFDIFF generates conformations with lower energy and better compliance with the Boltzmann distribution. The model's performance is evaluated on benchmark datasets, including fast-folding proteins and BPTI, with results indicating superior accuracy and diversity compared to existing methods. The force-guided approach effectively improves conformation stability without significantly reducing diversity, and the model excels in predicting metastable states of BPTI. The study highlights the potential of integrating physical guidance into diffusion models for more accurate and biologically relevant protein conformation generation.