SCIAGENT: Tool-augmented Language Models for Scientific Reasoning

SCIAGENT: Tool-augmented Language Models for Scientific Reasoning

21 Feb 2024 | Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, Weizhu Chen
SCIAGENT is a tool-augmented language model designed for scientific reasoning, which enhances the ability of large language models (LLMs) to solve scientific problems by utilizing domain-specific tools. The paper introduces a new task setting called tool-augmented scientific reasoning, where LLMs are supplemented with scalable toolsets, shifting the focus from being an omniscient problem solver to a proficient tool-user. To facilitate research in this setting, the authors construct a training corpus named MATH-FUNC, which includes over 30,000 samples and approximately 6,000 tools. Based on MATH-FUNC, they develop SCIAGENT, which can retrieve, understand, and use tools for scientific problem-solving. Additionally, they create a benchmark called SCITOOLBENCH, spanning five scientific domains, to evaluate LLMs' abilities with tool assistance. Extensive experiments on SCITOOLBENCH confirm the effectiveness of SCIAGENT. Notably, SCIAGENT-MISTRAL-7B surpasses other LLMs with the same size by more than 13% in absolute accuracy, and SCIAGENT-DEEPMATH-7B shows much superior performance than ChatGPT. The paper also discusses the challenges of scientific reasoning, the importance of math skills, and the role of function-augmented solutions in enhancing LLMs' tool-use abilities. The authors also analyze the robustness of their agents and the impact of retriever quality on performance. The work proposes a framework for dataset construction, model training, and evaluation, and highlights the importance of tool-augmented scientific reasoning in advancing the capabilities of LLMs.SCIAGENT is a tool-augmented language model designed for scientific reasoning, which enhances the ability of large language models (LLMs) to solve scientific problems by utilizing domain-specific tools. The paper introduces a new task setting called tool-augmented scientific reasoning, where LLMs are supplemented with scalable toolsets, shifting the focus from being an omniscient problem solver to a proficient tool-user. To facilitate research in this setting, the authors construct a training corpus named MATH-FUNC, which includes over 30,000 samples and approximately 6,000 tools. Based on MATH-FUNC, they develop SCIAGENT, which can retrieve, understand, and use tools for scientific problem-solving. Additionally, they create a benchmark called SCITOOLBENCH, spanning five scientific domains, to evaluate LLMs' abilities with tool assistance. Extensive experiments on SCITOOLBENCH confirm the effectiveness of SCIAGENT. Notably, SCIAGENT-MISTRAL-7B surpasses other LLMs with the same size by more than 13% in absolute accuracy, and SCIAGENT-DEEPMATH-7B shows much superior performance than ChatGPT. The paper also discusses the challenges of scientific reasoning, the importance of math skills, and the role of function-augmented solutions in enhancing LLMs' tool-use abilities. The authors also analyze the robustness of their agents and the impact of retriever quality on performance. The work proposes a framework for dataset construction, model training, and evaluation, and highlights the importance of tool-augmented scientific reasoning in advancing the capabilities of LLMs.
Reach us at info@study.space