ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

3 Apr 2024 | Yifan Xu, Xiao Liu, Xinghan Liu, Zhenyu Hou, Yueyan Li, Xiaohan Zhang, Zihan Wang, Aohan Zeng, Zhengxiao Du, Wenyi Zhao, Jie Tang, Yuxiao Dong
ChatGLM-Math introduces a self-critique pipeline to enhance mathematical problem-solving in large language models (LLMs) while maintaining language capabilities. The pipeline involves training a Math-Critique model to evaluate mathematical outputs and using rejective fine-tuning (RFT) and direct preference optimization (DPO) to improve LLMs. The Math-Critique model provides feedback on the correctness and reasoning of mathematical answers, enabling the LLM to learn from its own generated feedback. This approach outperforms existing methods in both mathematical and language tasks, achieving state-of-the-art results on the MATHUSEREVAL benchmark and other datasets. The pipeline is implemented on the ChatGLM3-32B model, and the MATHUSEREVAL dataset is created to assess LLMs in real-world mathematical scenarios. The method is effective in improving mathematical reasoning without compromising language abilities, and it has been deployed in the ChatGLM series. The results show that the self-critique pipeline significantly enhances mathematical problem-solving while maintaining language capabilities, outperforming larger LLMs. The method is also effective in improving general language capabilities, as demonstrated by performance on AlignBench and MT-Bench datasets. The study highlights the importance of balancing mathematical and language capabilities in LLMs and provides a framework for future research in this area.ChatGLM-Math introduces a self-critique pipeline to enhance mathematical problem-solving in large language models (LLMs) while maintaining language capabilities. The pipeline involves training a Math-Critique model to evaluate mathematical outputs and using rejective fine-tuning (RFT) and direct preference optimization (DPO) to improve LLMs. The Math-Critique model provides feedback on the correctness and reasoning of mathematical answers, enabling the LLM to learn from its own generated feedback. This approach outperforms existing methods in both mathematical and language tasks, achieving state-of-the-art results on the MATHUSEREVAL benchmark and other datasets. The pipeline is implemented on the ChatGLM3-32B model, and the MATHUSEREVAL dataset is created to assess LLMs in real-world mathematical scenarios. The method is effective in improving mathematical reasoning without compromising language abilities, and it has been deployed in the ChatGLM series. The results show that the self-critique pipeline significantly enhances mathematical problem-solving while maintaining language capabilities, outperforming larger LLMs. The method is also effective in improving general language capabilities, as demonstrated by performance on AlignBench and MT-Bench datasets. The study highlights the importance of balancing mathematical and language capabilities in LLMs and provides a framework for future research in this area.
Reach us at info@study.space
[slides and audio] ChatGLM-Math%3A Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline