ChatNT: A Multimodal Conversational Agent for DNA, RNA and Protein Tasks

ChatNT: A Multimodal Conversational Agent for DNA, RNA and Protein Tasks

May 2, 2024 | Guillaume Richard, Bernardo P. de Almeida, Hugo Dalla-Torre, Christopher Blum, Lorenz Hexemer, Priyanka Pandey, Stefan Laurent, Marie Lopez, Alexandre Laterre, Maren Lang, Ugur Sahin, Karim Beguir, Thomas Pierrot
ChatNT is a multimodal conversational agent designed to handle DNA, RNA, and protein tasks. It bridges the gap between biology foundation models and conversational agents by providing an advanced understanding of biological sequences. ChatNT achieves state-of-the-art results on the Nucleotide Transformer benchmark while solving multiple tasks simultaneously in English and generalizing to unseen questions. It also includes a new set of biologically relevant tasks across DNA, RNA, and proteins, spanning multiple species and processes. ChatNT performs on par with specialized models on these tasks and introduces a novel perplexity-based technique to calibrate model confidence. The framework is easily extendable to new tasks and biological data modalities, making it a widely applicable tool for biology. ChatNT is the first model of its kind and represents an initial step towards building generally capable agents that understand biology from first principles and are accessible to users without coding backgrounds. The model is trained to solve all tasks simultaneously using a unified objective, allowing seamless integration of new tasks and generalization. ChatNT is trained on a new dataset of genomics instructions tasks, including binary and multi-label classification and regression tasks. It demonstrates high performance across various genomics processes and species, including transcriptomics and proteomics tasks. ChatNT also includes a method to assess the confidence of its answers, improving model calibration. The model is designed to be accessible to users without programming backgrounds and can handle multiple sequences at once, reducing inference costs. ChatNT represents a significant step towards general-purpose AI for biology and medicine, demonstrating the potential of natural language models to process bio-sequence modalities and answer complex biological questions.ChatNT is a multimodal conversational agent designed to handle DNA, RNA, and protein tasks. It bridges the gap between biology foundation models and conversational agents by providing an advanced understanding of biological sequences. ChatNT achieves state-of-the-art results on the Nucleotide Transformer benchmark while solving multiple tasks simultaneously in English and generalizing to unseen questions. It also includes a new set of biologically relevant tasks across DNA, RNA, and proteins, spanning multiple species and processes. ChatNT performs on par with specialized models on these tasks and introduces a novel perplexity-based technique to calibrate model confidence. The framework is easily extendable to new tasks and biological data modalities, making it a widely applicable tool for biology. ChatNT is the first model of its kind and represents an initial step towards building generally capable agents that understand biology from first principles and are accessible to users without coding backgrounds. The model is trained to solve all tasks simultaneously using a unified objective, allowing seamless integration of new tasks and generalization. ChatNT is trained on a new dataset of genomics instructions tasks, including binary and multi-label classification and regression tasks. It demonstrates high performance across various genomics processes and species, including transcriptomics and proteomics tasks. ChatNT also includes a method to assess the confidence of its answers, improving model calibration. The model is designed to be accessible to users without programming backgrounds and can handle multiple sequences at once, reducing inference costs. ChatNT represents a significant step towards general-purpose AI for biology and medicine, demonstrating the potential of natural language models to process bio-sequence modalities and answer complex biological questions.
Reach us at info@study.space
[slides] ChatNT%3A A Multimodal Conversational Agent for DNA%2C RNA and Protein Tasks | StudySpace