ChatNT: A Multimodal Conversational Agent for DNA, RNA and Protein Tasks

ChatNT: A Multimodal Conversational Agent for DNA, RNA and Protein Tasks

May 2, 2024 | Guillaume Richard, Bernardo P. de Almeida, Hugo Dalla-Torre, Christopher Blum, Lorenz Hexemer, Priyanka Pandey, Stefan Laurent, Marie Lopez, Alexandre Laterre, Maren Lang, Uğur Şahin, Karim Beguir, Thomas Pierrot
ChatNT is a novel multimodal conversational agent designed to handle DNA, RNA, and protein sequences and solve a variety of biologically relevant tasks. It is the first model to achieve state-of-the-art performance on the Nucleotide Transformer benchmark while being able to solve all tasks simultaneously in English. The model is trained using a unified objective, allowing for seamless task integration and generalization. ChatNT's architecture combines a DNA encoder, an English decoder, and a projection layer to process and interpret biological sequences. The model is trained on a curated dataset of 27 genomics tasks, covering various regulatory processes and species. It demonstrates high performance on both classification and regression tasks, outperforming specialized models in some cases. Additionally, ChatNT introduces a perplexity-based technique to assess the confidence of its answers, enhancing its practical utility. This work lays the foundation for building general-purpose AI systems that can understand and interpret biological sequences, making them accessible to users without a coding background.ChatNT is a novel multimodal conversational agent designed to handle DNA, RNA, and protein sequences and solve a variety of biologically relevant tasks. It is the first model to achieve state-of-the-art performance on the Nucleotide Transformer benchmark while being able to solve all tasks simultaneously in English. The model is trained using a unified objective, allowing for seamless task integration and generalization. ChatNT's architecture combines a DNA encoder, an English decoder, and a projection layer to process and interpret biological sequences. The model is trained on a curated dataset of 27 genomics tasks, covering various regulatory processes and species. It demonstrates high performance on both classification and regression tasks, outperforming specialized models in some cases. Additionally, ChatNT introduces a perplexity-based technique to assess the confidence of its answers, enhancing its practical utility. This work lays the foundation for building general-purpose AI systems that can understand and interpret biological sequences, making them accessible to users without a coding background.
Reach us at info@study.space
[slides and audio] ChatNT%3A A Multimodal Conversational Agent for DNA%2C RNA and Protein Tasks