The paper introduces CONFID, a force-guided SE(3) diffusion model for generating protein conformations. Traditional physics-based methods like molecular dynamics (MD) simulations suffer from rare event sampling and long equilibration times, limiting their applicability to general protein systems. Deep generative models, particularly diffusion models, have been used to generate novel protein conformations, but existing score-based methods lack physical prior knowledge to guide the generation process, leading to deviations from the equilibrium distribution.
CONFID addresses these limitations by incorporating a force-guided network with a mixture of data-based score models. This approach allows the model to generate protein conformations with rich diversity while preserving high fidelity. The model is trained on protein structures from the Protein Data Bank (PDB) and self-generated conformation samples, without relying on MD simulation data.
The main contributions of this work include:
1. Using a sequence-conditional model to guide an unconditional model, achieving a better balance between conformation quality and diversity.
2. Utilizing the MD energy function as a physics-based reward to guide the generation of protein conformations.
3. Proposing an intermediate force guidance strategy during the diffusion sampling process, which is the first force-guided network suitable for protein conformation generation.
Experiments on various benchmarks, including 12 fast-folding proteins and the Bovine Pancreatic Trypsin Inhibitor (BPTI), demonstrate that CONFID outperforms state-of-the-art methods in terms of energy and force guidance, generating diverse samples that better adhere to the Boltzmann distribution.The paper introduces CONFID, a force-guided SE(3) diffusion model for generating protein conformations. Traditional physics-based methods like molecular dynamics (MD) simulations suffer from rare event sampling and long equilibration times, limiting their applicability to general protein systems. Deep generative models, particularly diffusion models, have been used to generate novel protein conformations, but existing score-based methods lack physical prior knowledge to guide the generation process, leading to deviations from the equilibrium distribution.
CONFID addresses these limitations by incorporating a force-guided network with a mixture of data-based score models. This approach allows the model to generate protein conformations with rich diversity while preserving high fidelity. The model is trained on protein structures from the Protein Data Bank (PDB) and self-generated conformation samples, without relying on MD simulation data.
The main contributions of this work include:
1. Using a sequence-conditional model to guide an unconditional model, achieving a better balance between conformation quality and diversity.
2. Utilizing the MD energy function as a physics-based reward to guide the generation of protein conformations.
3. Proposing an intermediate force guidance strategy during the diffusion sampling process, which is the first force-guided network suitable for protein conformation generation.
Experiments on various benchmarks, including 12 fast-folding proteins and the Bovine Pancreatic Trypsin Inhibitor (BPTI), demonstrate that CONFID outperforms state-of-the-art methods in terms of energy and force guidance, generating diverse samples that better adhere to the Boltzmann distribution.