[slides and audio] A review of large language models and autonomous agents in chemistry

This review explores the application of Large Language Models (LLMs) and autonomous agents in chemistry, highlighting their potential to accelerate scientific discovery. LLMs have significantly impacted molecule design, property prediction, and synthesis optimization. Autonomous agents, which integrate LLMs with various tools to interact with their environment, perform diverse tasks such as paper scraping, interfacing with automated laboratories, and synthesis planning. The review covers the history, current capabilities, and design of LLMs and autonomous agents, addressing challenges and future directions in chemistry. Key challenges include data quality and integration, model interpretability, and the need for standard benchmarks. Future directions point towards more sophisticated multi-modal agents and enhanced collaboration between agents and experimental methods. A repository has been built to track the latest studies: https://github.com/ur-whitelab/LLMs-in-science. The review discusses the importance of trustworthy datasets and the necessity of good benchmarks for LLM applications in molecular representations, property prediction, inverse design, and synthesis prediction. It also explores the use of encoder-only and decoder-only models for these tasks, as well as multi-task and multi-modal models. The review highlights the potential of LLMs in drug discovery, material development, and other chemical applications, emphasizing the need for high-quality data and the importance of benchmarking in AI-driven chemistry.This review explores the application of Large Language Models (LLMs) and autonomous agents in chemistry, highlighting their potential to accelerate scientific discovery. LLMs have significantly impacted molecule design, property prediction, and synthesis optimization. Autonomous agents, which integrate LLMs with various tools to interact with their environment, perform diverse tasks such as paper scraping, interfacing with automated laboratories, and synthesis planning. The review covers the history, current capabilities, and design of LLMs and autonomous agents, addressing challenges and future directions in chemistry. Key challenges include data quality and integration, model interpretability, and the need for standard benchmarks. Future directions point towards more sophisticated multi-modal agents and enhanced collaboration between agents and experimental methods. A repository has been built to track the latest studies: https://github.com/ur-whitelab/LLMs-in-science. The review discusses the importance of trustworthy datasets and the necessity of good benchmarks for LLM applications in molecular representations, property prediction, inverse design, and synthesis prediction. It also explores the use of encoder-only and decoder-only models for these tasks, as well as multi-task and multi-modal models. The review highlights the potential of LLMs in drug discovery, material development, and other chemical applications, emphasizing the need for high-quality data and the importance of benchmarking in AI-driven chemistry.

A Review of Large Language Models and Autonomous Agents in Chemistry

July 29, 2024 | Mayk Caldas Ramos, Christopher J. Collison, Andrew D. White