Understanding JiuZhang3.0%3A Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

The paper "JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models" addresses the challenge of enhancing mathematical reasoning capabilities in large language models (LLMs). To reduce the high costs associated with collecting large-scale math-related texts or using powerful LLMs like GPT-4 for problem synthesis, the authors propose an efficient method to train a small LLM for generating high-quality math problems. They create a dataset using GPT-4 to distill its data synthesis capability into a smaller model. This is achieved by crafting prompts based on human education stages and selecting valuable math-related texts using gradient-based influence estimation. The small LLM is then used to synthesize 6 million math problems for pre-training the JiuZhang3.0 model, which only requires invoking GPT-4 API 9.3k times and pre-training on 4.6B data. Experimental results show that JiuZhang3.0 achieves state-of-the-art performance on several mathematical reasoning datasets, both in natural language reasoning and tool manipulation settings. The paper also includes a detailed analysis of the approach, including ablation studies and cost estimation, demonstrating the effectiveness and efficiency of the proposed method.The paper "JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models" addresses the challenge of enhancing mathematical reasoning capabilities in large language models (LLMs). To reduce the high costs associated with collecting large-scale math-related texts or using powerful LLMs like GPT-4 for problem synthesis, the authors propose an efficient method to train a small LLM for generating high-quality math problems. They create a dataset using GPT-4 to distill its data synthesis capability into a smaller model. This is achieved by crafting prompts based on human education stages and selecting valuable math-related texts using gradient-based influence estimation. The small LLM is then used to synthesize 6 million math problems for pre-training the JiuZhang3.0 model, which only requires invoking GPT-4 API 9.3k times and pre-training on 4.6B data. Experimental results show that JiuZhang3.0 achieves state-of-the-art performance on several mathematical reasoning datasets, both in natural language reasoning and tool manipulation settings. The paper also includes a detailed analysis of the approach, including ablation studies and cost estimation, demonstrating the effectiveness and efficiency of the proposed method.

JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

23 May 2024 | Kun Zhou, Beichen Zhang, Jiapeng Wang, Zhipeng Chen, Wayne Xin Zhao, Jing Sha, Zhichao Sheng, Shijin Wang, Ji-Rong Wen