12 Mar 2024 | Dan Zhang, Ziniu Hu, Sining Zhoubian, Zhengxiao Du, Kaiyu Yang, Zihan Wang, Yisong Yue, Yuxiao Dong, Jie Tang
The paper introduces SciGLM, a suite of scientific language models designed to enhance college-level scientific reasoning. To address the data scarcity challenge in the science domain, the authors propose a novel self-reflective instruction annotation framework. This framework leverages existing large language models (LLMs) to generate step-by-step reasoning for unlabelled scientific questions, followed by a self-reflective critic-and-revise process. Using this framework, they curated SciInstruct, a diverse and high-quality dataset covering physics, chemistry, math, and formal proofs. The ChatGLM family of language models was fine-tuned on SciInstruct, significantly improving their scientific and mathematical reasoning capabilities. Specifically, the SciGLM models consistently outperformed the base model (ChatGLM3-6B-Base) by 4.87% and the larger-scale model (32B) by 2.67%, without compromising language understanding. The authors also released SciInstruct, the fine-tuned models, and the self-reflective framework along with the code at <https://github.com/THUDM/SciGLM>. The paper discusses the construction of SciInstruct, the self-reflective annotation process, and the evaluation of the SciGLM models on various scientific and mathematical benchmarks. The results demonstrate that SciGLM not only improves performance on scientific reasoning tasks but also maintains or enhances general language understanding capabilities.The paper introduces SciGLM, a suite of scientific language models designed to enhance college-level scientific reasoning. To address the data scarcity challenge in the science domain, the authors propose a novel self-reflective instruction annotation framework. This framework leverages existing large language models (LLMs) to generate step-by-step reasoning for unlabelled scientific questions, followed by a self-reflective critic-and-revise process. Using this framework, they curated SciInstruct, a diverse and high-quality dataset covering physics, chemistry, math, and formal proofs. The ChatGLM family of language models was fine-tuned on SciInstruct, significantly improving their scientific and mathematical reasoning capabilities. Specifically, the SciGLM models consistently outperformed the base model (ChatGLM3-6B-Base) by 4.87% and the larger-scale model (32B) by 2.67%, without compromising language understanding. The authors also released SciInstruct, the fine-tuned models, and the self-reflective framework along with the code at <https://github.com/THUDM/SciGLM>. The paper discusses the construction of SciInstruct, the self-reflective annotation process, and the evaluation of the SciGLM models on various scientific and mathematical benchmarks. The results demonstrate that SciGLM not only improves performance on scientific reasoning tasks but also maintains or enhances general language understanding capabilities.