The paper introduces a new task setting called *tool-augmented scientific reasoning*, which aims to enhance the capabilities of Large Language Models (LLMs) in solving scientific problems by providing them with scalable toolsets. To facilitate this research, the authors construct a tool-augmented training corpus named MATH-FUNC, which includes over 30,000 samples and approximately 6,000 tools. Based on this corpus, they develop SciAGENT, an agent that can retrieve, understand, and use tools for scientific problem-solving. Additionally, they create a benchmark called SciTOOLBENCH, spanning five scientific domains, to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciTOOLBENCH show that SciAGENT-MISTRAL-7B outperforms other LLMs with the same size by more than 13% in absolute accuracy, and SciAGENT-DeepMATH-7B significantly outperforms ChatGPT. The paper also includes a detailed analysis of the benefits and limitations of the proposed approach.The paper introduces a new task setting called *tool-augmented scientific reasoning*, which aims to enhance the capabilities of Large Language Models (LLMs) in solving scientific problems by providing them with scalable toolsets. To facilitate this research, the authors construct a tool-augmented training corpus named MATH-FUNC, which includes over 30,000 samples and approximately 6,000 tools. Based on this corpus, they develop SciAGENT, an agent that can retrieve, understand, and use tools for scientific problem-solving. Additionally, they create a benchmark called SciTOOLBENCH, spanning five scientific domains, to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciTOOLBENCH show that SciAGENT-MISTRAL-7B outperforms other LLMs with the same size by more than 13% in absolute accuracy, and SciAGENT-DeepMATH-7B significantly outperforms ChatGPT. The paper also includes a detailed analysis of the benefits and limitations of the proposed approach.