SCIAGENT: Tool-augmented Language Models for Scientific Reasoning

SCIAGENT: Tool-augmented Language Models for Scientific Reasoning

21 Feb 2024 | Yubo Ma1*, Zhibin Gou2*, Junheng Hao3, Ruochen Xu3, Shuohang Wang3, Liangming Pan1, Yujiu Yang2, Yixin Cao5, Aixin Sun1, Hany Awadalla3, Weizhu Chen3
The paper introduces a new task setting called *tool-augmented scientific reasoning*, which aims to enhance the capabilities of Large Language Models (LLMs) in solving scientific problems by providing them with scalable toolsets. To facilitate this research, the authors construct a tool-augmented training corpus named MATH-FUNC, which includes over 30,000 samples and approximately 6,000 tools. Based on this corpus, they develop SciAGENT, an agent that can retrieve, understand, and use tools for scientific problem-solving. Additionally, they create a benchmark called SciTOOLBENCH, spanning five scientific domains, to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciTOOLBENCH show that SciAGENT-MISTRAL-7B outperforms other LLMs with the same size by more than 13% in absolute accuracy, and SciAGENT-DeepMATH-7B significantly outperforms ChatGPT. The paper also includes a detailed analysis of the benefits and limitations of the proposed approach.The paper introduces a new task setting called *tool-augmented scientific reasoning*, which aims to enhance the capabilities of Large Language Models (LLMs) in solving scientific problems by providing them with scalable toolsets. To facilitate this research, the authors construct a tool-augmented training corpus named MATH-FUNC, which includes over 30,000 samples and approximately 6,000 tools. Based on this corpus, they develop SciAGENT, an agent that can retrieve, understand, and use tools for scientific problem-solving. Additionally, they create a benchmark called SciTOOLBENCH, spanning five scientific domains, to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciTOOLBENCH show that SciAGENT-MISTRAL-7B outperforms other LLMs with the same size by more than 13% in absolute accuracy, and SciAGENT-DeepMATH-7B significantly outperforms ChatGPT. The paper also includes a detailed analysis of the benefits and limitations of the proposed approach.
Reach us at info@study.space
Understanding SciAgent%3A Tool-augmented Language Models for Scientific Reasoning