SegmentNT: annotating the genome at single-nucleotide resolution with DNA foundation models

SegmentNT: annotating the genome at single-nucleotide resolution with DNA foundation models

March 27, 2024 | Bernardo P. de Almeida, Hugo Dalla-Torre, Guillaume Richard, Christopher Blum, Lorenz Hexemer, Maxence Gélard, Javier Mendoza-Revilla, Priyanka Pandey, Stefan Laurent, Marie Lopez, Alexandre Laterre, Maren Lang, Uğur Şahin, Karim Beguir, Thomas Pierrot
**SegmentNT: Annotating the Genome at Single-Nucleotide Resolution with DNA Foundation Models** This paper introduces SegmentNT, a novel model designed to predict the location of various genomic elements at single-nucleotide resolution. SegmentNT combines a pre-trained DNA foundation model, the Nucleotide Transformer (NT), with a 1D U-Net segmentation head to process input DNA sequences up to 30kb in length. The model is trained on a dataset of annotations for 14 types of genomic elements in the human genome, including gene elements and regulatory elements. Key findings include: 1. **Performance**: SegmentNT achieves high accuracy in localizing genomic elements at nucleotide precision, outperforming several ablation models and achieving superior performance on splice site detection and exon/intron structure prediction. 2. **Generalization**: The model generalizes well to sequences up to 50kb and can be extended to different species, demonstrating strong performance on unseen animal and plant species. 3. **Efficiency**: SegmentNT is significantly faster than alternative approaches, making it suitable for large-scale applications. 4. **Complexity Handling**: The model can predict the impact of sequence variants on transcript isoforms, providing insights into gene regulation and disease. The paper highlights the potential of DNA foundation models in tackling complex tasks in genomics, particularly at single-nucleotide resolution, and suggests that SegmentNT can be a valuable tool for the genomics community. The authors make their models available on GitHub and HuggingFace Space for further research and application.**SegmentNT: Annotating the Genome at Single-Nucleotide Resolution with DNA Foundation Models** This paper introduces SegmentNT, a novel model designed to predict the location of various genomic elements at single-nucleotide resolution. SegmentNT combines a pre-trained DNA foundation model, the Nucleotide Transformer (NT), with a 1D U-Net segmentation head to process input DNA sequences up to 30kb in length. The model is trained on a dataset of annotations for 14 types of genomic elements in the human genome, including gene elements and regulatory elements. Key findings include: 1. **Performance**: SegmentNT achieves high accuracy in localizing genomic elements at nucleotide precision, outperforming several ablation models and achieving superior performance on splice site detection and exon/intron structure prediction. 2. **Generalization**: The model generalizes well to sequences up to 50kb and can be extended to different species, demonstrating strong performance on unseen animal and plant species. 3. **Efficiency**: SegmentNT is significantly faster than alternative approaches, making it suitable for large-scale applications. 4. **Complexity Handling**: The model can predict the impact of sequence variants on transcript isoforms, providing insights into gene regulation and disease. The paper highlights the potential of DNA foundation models in tackling complex tasks in genomics, particularly at single-nucleotide resolution, and suggests that SegmentNT can be a valuable tool for the genomics community. The authors make their models available on GitHub and HuggingFace Space for further research and application.
Reach us at info@study.space