[slides and audio] Designing realistic regulatory DNA with autoregressive language models

The paper presents regLM, a framework that uses autoregressive language models and supervised sequence-to-function models to design synthetic *cis*-regulatory elements (CREs) with desired properties, such as high, low, or cell type-specific activity. The framework is based on the HyenaDNA foundation model, which is trained on the human genome and can generate regulatory elements with high accuracy. The authors demonstrate the effectiveness of regLM by designing synthetic yeast promoters and cell type-specific human enhancers. The generated CREs are evaluated for their biological realism and compared to experimentally validated CREs, showing high concordance with known regulatory syntax. The study also explores the interpretability of the regLM model and discusses potential advantages and disadvantages of using language models for regulatory DNA design.The paper presents regLM, a framework that uses autoregressive language models and supervised sequence-to-function models to design synthetic *cis*-regulatory elements (CREs) with desired properties, such as high, low, or cell type-specific activity. The framework is based on the HyenaDNA foundation model, which is trained on the human genome and can generate regulatory elements with high accuracy. The authors demonstrate the effectiveness of regLM by designing synthetic yeast promoters and cell type-specific human enhancers. The generated CREs are evaluated for their biological realism and compared to experimentally validated CREs, showing high concordance with known regulatory syntax. The study also explores the interpretability of the regLM model and discusses potential advantages and disadvantages of using language models for regulatory DNA design.

Designing realistic regulatory DNA with autoregressive language models

2024 | Avantika Lal, David Garfield, Tommaso Biancalani, Gokcen Eraslan