JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

23 May 2024 | Kun Zhou, Beichen Zhang, Jiapeng Wang, Zhipeng Chen, Wayne Xin Zhao, Jing Sha, Zhichao Sheng, Shijin Wang, Ji-Rong Wen
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models This paper proposes an efficient method to enhance the mathematical reasoning ability of large language models (LLMs) by training a small LLM to synthesize sufficient high-quality math problems for pre-training. The key idea is that small LLMs can effectively learn data synthesis capabilities. To achieve this, the authors create a dataset using GPT-4 to distill its data synthesis capability into the small LLM. They craft prompts based on human education stages to guide GPT-4 in synthesizing math problems covering diverse knowledge and difficulty levels. They also use gradient-based influence estimation to select the most valuable math-related texts. These are fed into GPT-4 to create a knowledge distillation dataset for training the small LLM. The authors synthesize 6 million math problems for pre-training their JiuZhang3.0 model, which only needs to invoke GPT-4 API 9.3k times and pre-train on 4.6B data. Experimental results show that JiuZhang3.0 achieves state-of-the-art performance on several mathematical reasoning datasets, under both natural language reasoning and tool manipulation settings. The authors also propose an efficient solution for training LLMs to improve mathematical reasoning, which only needs to invoke GPT-4 API 9.3k times and pre-train on 4.6B high-quality synthetic data, with nearly 20% total cost of existing state-of-the-art methods. JiuZhang3.0 achieves state-of-the-art performance among open-source LLMs on several tasks and settings, e.g., 52.8 (JiuZhang3.0-7B) vs. 50.2 (DeepSeekMath-7B-RL) on MATH, 89.8 (JiuZhang3.0-8×7B) vs. 86.4 (MAmmoTH2-8×7B-Plus) on GSM8k in the natural language reasoning setting. The authors also conduct ablation studies and variation studies to verify the effectiveness of their method. The results show that their method is effective in improving mathematical reasoning ability of LLMs. The authors conclude that their method is efficient and effective in improving the mathematical reasoning ability of LLMs.JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models This paper proposes an efficient method to enhance the mathematical reasoning ability of large language models (LLMs) by training a small LLM to synthesize sufficient high-quality math problems for pre-training. The key idea is that small LLMs can effectively learn data synthesis capabilities. To achieve this, the authors create a dataset using GPT-4 to distill its data synthesis capability into the small LLM. They craft prompts based on human education stages to guide GPT-4 in synthesizing math problems covering diverse knowledge and difficulty levels. They also use gradient-based influence estimation to select the most valuable math-related texts. These are fed into GPT-4 to create a knowledge distillation dataset for training the small LLM. The authors synthesize 6 million math problems for pre-training their JiuZhang3.0 model, which only needs to invoke GPT-4 API 9.3k times and pre-train on 4.6B data. Experimental results show that JiuZhang3.0 achieves state-of-the-art performance on several mathematical reasoning datasets, under both natural language reasoning and tool manipulation settings. The authors also propose an efficient solution for training LLMs to improve mathematical reasoning, which only needs to invoke GPT-4 API 9.3k times and pre-train on 4.6B high-quality synthetic data, with nearly 20% total cost of existing state-of-the-art methods. JiuZhang3.0 achieves state-of-the-art performance among open-source LLMs on several tasks and settings, e.g., 52.8 (JiuZhang3.0-7B) vs. 50.2 (DeepSeekMath-7B-RL) on MATH, 89.8 (JiuZhang3.0-8×7B) vs. 86.4 (MAmmoTH2-8×7B-Plus) on GSM8k in the natural language reasoning setting. The authors also conduct ablation studies and variation studies to verify the effectiveness of their method. The results show that their method is effective in improving mathematical reasoning ability of LLMs. The authors conclude that their method is efficient and effective in improving the mathematical reasoning ability of LLMs.
Reach us at info@study.space
[slides and audio] JiuZhang3.0%3A Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models