Designing DNA With Tunable Regulatory Activity Using Discrete Diffusion

Designing DNA With Tunable Regulatory Activity Using Discrete Diffusion

May 24, 2024 | Anirban Sarkar*, Ziqi Tang, Chris Zhao, Peter K Koo*
DNA Discrete Diffusion (D3) is a generative model designed to conditionally sample regulatory DNA sequences with targeted functional activity levels. The model addresses challenges in generating regulatory DNA sequences by learning the underlying distribution of functional sequences and enabling conditional generation based on desired attributes. D3 introduces a comprehensive set of evaluation metrics to assess the functional similarity, sequence similarity, and regulatory composition of generated sequences. Through benchmarking on three high-quality functional genomics datasets—human promoters, fly enhancers, and cell-type-specific MPRA data—D3 outperforms existing methods in capturing the diversity of cis-regulatory grammars and generating sequences that more accurately reflect the properties of genomic regulatory DNA. Additionally, D3-generated sequences can effectively augment supervised models and improve their predictive performance, even in data-limited scenarios. The model uses a discrete diffusion framework, incorporating score entropy and a specific forward process tailored for genomics data. D3 demonstrates superior performance in functional similarity, sequence similarity, and compositional similarity compared to other models. It also shows strong results in task-specific design, where sequences are generated with desired activity levels across different tasks. The study highlights the potential of diffusion-based generative models to learn complex cis-regulatory mechanisms underlying gene regulation, opening new avenues for targeted sequence design in biomedicine and synthetic biology. The broader impact of this research includes the potential to accelerate the development of new genetic therapies, enhance stem cell differentiation protocols, and transform synthetic biology by reprogramming genetic circuits. However, the research also emphasizes the need for robust governance frameworks to ensure the safe and ethical use of engineered regulatory sequences.DNA Discrete Diffusion (D3) is a generative model designed to conditionally sample regulatory DNA sequences with targeted functional activity levels. The model addresses challenges in generating regulatory DNA sequences by learning the underlying distribution of functional sequences and enabling conditional generation based on desired attributes. D3 introduces a comprehensive set of evaluation metrics to assess the functional similarity, sequence similarity, and regulatory composition of generated sequences. Through benchmarking on three high-quality functional genomics datasets—human promoters, fly enhancers, and cell-type-specific MPRA data—D3 outperforms existing methods in capturing the diversity of cis-regulatory grammars and generating sequences that more accurately reflect the properties of genomic regulatory DNA. Additionally, D3-generated sequences can effectively augment supervised models and improve their predictive performance, even in data-limited scenarios. The model uses a discrete diffusion framework, incorporating score entropy and a specific forward process tailored for genomics data. D3 demonstrates superior performance in functional similarity, sequence similarity, and compositional similarity compared to other models. It also shows strong results in task-specific design, where sequences are generated with desired activity levels across different tasks. The study highlights the potential of diffusion-based generative models to learn complex cis-regulatory mechanisms underlying gene regulation, opening new avenues for targeted sequence design in biomedicine and synthetic biology. The broader impact of this research includes the potential to accelerate the development of new genetic therapies, enhance stem cell differentiation protocols, and transform synthetic biology by reprogramming genetic circuits. However, the research also emphasizes the need for robust governance frameworks to ensure the safe and ethical use of engineered regulatory sequences.
Reach us at info@study.space
Understanding Designing DNA With Tunable Regulatory Activity Using Discrete Diffusion