Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation

Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation

30 May 2024 | Guillaume Huguet, James Vuckovic, Kilian Fatras, Eric Thibodeau-Laufer, Pablo Lemos, Riashat Islam, Cheng-Hao Liu, Jarrid Rector-Brooks, Tara Akhound-Sadegh, Michael Bronstein, Alexander Tong, Avishek Joey Bose
This paper introduces FOLDFLOW-2, a novel sequence-conditioned SE(3)-equivariant flow matching model for protein structure generation. FOLDFLOW-2 leverages a large pre-trained protein language model to encode sequences and combines structure and sequence representations through a multi-modal fusion trunk. It includes a geometric transformer-based decoder. The model is trained on a large dataset, an order of magnitude larger than previous PDB datasets, containing both known proteins and high-quality synthetic structures. FOLDFLOW-2 is evaluated on unconditional and conditional generation tasks, demonstrating superior performance in terms of designability, novelty, and diversity compared to state-of-the-art models like RFDiffusion. It also shows improved secondary structure diversity through fine-tuning and can solve challenging conditional design tasks, such as motif scaffolding and designing scaffolds for VHH nanobodies. The model's ability to perform zero-shot equilibrium conformation sampling on unseen proteins is also discussed, showing competitive performance with models fine-tuned on molecular dynamics data.This paper introduces FOLDFLOW-2, a novel sequence-conditioned SE(3)-equivariant flow matching model for protein structure generation. FOLDFLOW-2 leverages a large pre-trained protein language model to encode sequences and combines structure and sequence representations through a multi-modal fusion trunk. It includes a geometric transformer-based decoder. The model is trained on a large dataset, an order of magnitude larger than previous PDB datasets, containing both known proteins and high-quality synthetic structures. FOLDFLOW-2 is evaluated on unconditional and conditional generation tasks, demonstrating superior performance in terms of designability, novelty, and diversity compared to state-of-the-art models like RFDiffusion. It also shows improved secondary structure diversity through fine-tuning and can solve challenging conditional design tasks, such as motif scaffolding and designing scaffolds for VHH nanobodies. The model's ability to perform zero-shot equilibrium conformation sampling on unseen proteins is also discussed, showing competitive performance with models fine-tuned on molecular dynamics data.
Reach us at info@study.space