26 Feb 2024 | Zimu Lu*, Aojun Zhou*, Houxing Ren, Ke Wang, Weikang Shi, Junting Pan, Mingjie Zhan, Hongsheng Li†
MathGenie is a novel method for generating diverse and reliable math problems and solutions using a small-scale problem-solution dataset. The method involves iterative solution augmentation, question back-translation, and verification-based solution filtering. By augmenting ground-truth solutions and translating them back into new questions, MathGenie creates a large-scale collection of augmented solutions. These solutions are then verified using code-integrated rationales to ensure their correctness. Various pretrained models, ranging from 7B to 70B parameters, are trained on the curated data, resulting in a family of models called *MathGenieLM*. These models consistently outperform existing open-source models across five representative mathematical reasoning datasets, achieving state-of-the-art performance. Specifically, MathGenieLM-InternLM2 achieves an accuracy of 87.7% on GSM8K and 55.7% on MATH, securing the best overall score among open-source language models. The main contributions of the paper include the proposed MathGenie pipeline and the demonstration of its effectiveness through extensive experiments.MathGenie is a novel method for generating diverse and reliable math problems and solutions using a small-scale problem-solution dataset. The method involves iterative solution augmentation, question back-translation, and verification-based solution filtering. By augmenting ground-truth solutions and translating them back into new questions, MathGenie creates a large-scale collection of augmented solutions. These solutions are then verified using code-integrated rationales to ensure their correctness. Various pretrained models, ranging from 7B to 70B parameters, are trained on the curated data, resulting in a family of models called *MathGenieLM*. These models consistently outperform existing open-source models across five representative mathematical reasoning datasets, achieving state-of-the-art performance. Specifically, MathGenieLM-InternLM2 achieves an accuracy of 87.7% on GSM8K and 55.7% on MATH, securing the best overall score among open-source language models. The main contributions of the paper include the proposed MathGenie pipeline and the demonstration of its effectiveness through extensive experiments.