Designing DNA With Tunable Regulatory Activity Using Discrete Diffusion

Designing DNA With Tunable Regulatory Activity Using Discrete Diffusion

May 24, 2024 | Anirban Sarkar, Ziqi Tang, Chris Zhao, Peter K Koo
The paper introduces DNA Discrete Diffusion (D3), a generative framework for conditionally sampling regulatory DNA sequences with targeted functional activity levels. D3 addresses the challenge of designing DNA sequences that precisely control gene expression in specific cell types, which is crucial for medicine and biotechnology. The authors develop a comprehensive suite of evaluation metrics to assess the functional similarity, sequence similarity, and regulatory composition of generated sequences. Through benchmarking on three high-quality functional genomics datasets (human promoters, fly enhancers, and cell-type-specific MPRA), D3 outperforms existing methods in capturing the diversity of cis-regulatory grammars and generating sequences that more accurately reflect the properties of genomic regulatory DNA. Additionally, D3-generated sequences effectively augment supervised models, improving their predictive performance even in data-limited scenarios. The paper highlights the potential of diffusion-based generative models in learning complex cis-regulatory mechanisms and opens new avenues for targeted sequence design in biomedicine and synthetic biology.The paper introduces DNA Discrete Diffusion (D3), a generative framework for conditionally sampling regulatory DNA sequences with targeted functional activity levels. D3 addresses the challenge of designing DNA sequences that precisely control gene expression in specific cell types, which is crucial for medicine and biotechnology. The authors develop a comprehensive suite of evaluation metrics to assess the functional similarity, sequence similarity, and regulatory composition of generated sequences. Through benchmarking on three high-quality functional genomics datasets (human promoters, fly enhancers, and cell-type-specific MPRA), D3 outperforms existing methods in capturing the diversity of cis-regulatory grammars and generating sequences that more accurately reflect the properties of genomic regulatory DNA. Additionally, D3-generated sequences effectively augment supervised models, improving their predictive performance even in data-limited scenarios. The paper highlights the potential of diffusion-based generative models in learning complex cis-regulatory mechanisms and opens new avenues for targeted sequence design in biomedicine and synthetic biology.
Reach us at info@study.space