Understanding Are LLMs Ready for Real-World Materials Discovery%3F

This paper explores the potential and limitations of Large Language Models (LLMs) in materials science, highlighting their current shortcomings and proposing a framework for developing Materials Science LLMs (MatSci-LLMs). While LLMs show promise in accelerating materials understanding and discovery, they currently fall short in practical applications due to their inability to comprehend and reason over complex, interconnected materials science knowledge. The paper outlines key challenges in materials science information extraction, including domain-specific notations, incomplete descriptions, and multi-modal data processing. It emphasizes the need for high-quality, multi-modal datasets sourced from scientific literature to build effective MatSci-LLMs. The paper also discusses the importance of hypothesis generation, grounded reasoning, and the integration of LLMs with real-world simulation and experimental tools. Finally, it proposes a roadmap for applying future MatSci-LLMs in materials discovery through automated knowledge base generation, in-silico material design, and self-driving materials laboratories. The ultimate goal is to enable end-to-end automation of materials design, enhancing human understanding and accelerating the discovery of new materials.This paper explores the potential and limitations of Large Language Models (LLMs) in materials science, highlighting their current shortcomings and proposing a framework for developing Materials Science LLMs (MatSci-LLMs). While LLMs show promise in accelerating materials understanding and discovery, they currently fall short in practical applications due to their inability to comprehend and reason over complex, interconnected materials science knowledge. The paper outlines key challenges in materials science information extraction, including domain-specific notations, incomplete descriptions, and multi-modal data processing. It emphasizes the need for high-quality, multi-modal datasets sourced from scientific literature to build effective MatSci-LLMs. The paper also discusses the importance of hypothesis generation, grounded reasoning, and the integration of LLMs with real-world simulation and experimental tools. Finally, it proposes a roadmap for applying future MatSci-LLMs in materials discovery through automated knowledge base generation, in-silico material design, and self-driving materials laboratories. The ultimate goal is to enable end-to-end automation of materials design, enhancing human understanding and accelerating the discovery of new materials.

Are LLMs Ready for Real-World Materials Discovery?

7 Feb 2024 | Santiago Miret * 1 N. M. Anoop Krishnan * 2