Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design

Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design

5 Jun 2024 | Andrew Campbell * 1 Jason Yim * 2 Regina Barzilay 2 Tom Rainforth 1 Tommi Jaakkola 2
The paper introduces Discrete Flow Models (DFMs), a novel framework for generative modeling of discrete data, which bridges the gap between continuous and discrete spaces. DFMs are based on Continuous Time Markov Chains (CTMCs) and offer improved performance over existing diffusion-based approaches. The key insight is that CTMCs can be used to realize the discrete equivalent of continuous space flow matching. DFMs are simple to derive and allow for sampling flexibility without re-training, making them suitable for multimodal generative tasks. The authors apply DFMs to the challenging task of protein co-design, where the goal is to jointly generate protein structures and sequences. They develop Multiflow, a multimodal generative model that combines a DFM for sequence generation and a flow-based structure generation method. Multiflow achieves state-of-the-art performance in protein co-design while allowing flexible generation of both sequence and structure. Experiments on text data and protein generation demonstrate the effectiveness of DFMs. Multiflow outperforms discrete diffusion models in text modeling and achieves competitive performance in protein generation tasks, including forward and inverse folding. The paper also explores the impact of CTMC stochasticity on the structural properties of sampled proteins, showing that it can be used to tune properties between data modalities at inference time. The contributions of the paper include the introduction of DFMs, the development of Multiflow, and the demonstration of its superior performance in protein co-design. The authors discuss future work, including the development of more domain-specific models and improving Multiflow's performance on all protein generation tasks.The paper introduces Discrete Flow Models (DFMs), a novel framework for generative modeling of discrete data, which bridges the gap between continuous and discrete spaces. DFMs are based on Continuous Time Markov Chains (CTMCs) and offer improved performance over existing diffusion-based approaches. The key insight is that CTMCs can be used to realize the discrete equivalent of continuous space flow matching. DFMs are simple to derive and allow for sampling flexibility without re-training, making them suitable for multimodal generative tasks. The authors apply DFMs to the challenging task of protein co-design, where the goal is to jointly generate protein structures and sequences. They develop Multiflow, a multimodal generative model that combines a DFM for sequence generation and a flow-based structure generation method. Multiflow achieves state-of-the-art performance in protein co-design while allowing flexible generation of both sequence and structure. Experiments on text data and protein generation demonstrate the effectiveness of DFMs. Multiflow outperforms discrete diffusion models in text modeling and achieves competitive performance in protein generation tasks, including forward and inverse folding. The paper also explores the impact of CTMC stochasticity on the structural properties of sampled proteins, showing that it can be used to tune properties between data modalities at inference time. The contributions of the paper include the introduction of DFMs, the development of Multiflow, and the demonstration of its superior performance in protein co-design. The authors discuss future work, including the development of more domain-specific models and improving Multiflow's performance on all protein generation tasks.
Reach us at info@study.space