15 Feb 2024 | Shubham Toshniwal, Ivan Moshkov, Sean Narenthiran, Daria Gitman, Fei Jia, Igor Gitman
The paper introduces OpenMathInstruct-1, a large-scale math instruction tuning dataset containing 1.8 million problem-solution pairs. This dataset is constructed by synthesizing code-interpreter solutions for the GSM8K and MATH benchmarks using the Mixtral model, a permissively licensed open-source LLM. The authors address the gap between the mathematical skills of closed-source LLMs like GPT-4 and open-source LLMs by proposing novel prompting techniques and brute-force scaling. The best model, OpenMath-CodeLlama-70B, trained on a subset of OpenMathInstruct-1, achieves competitive performance with gpt-distilled models on both GSM8K (84.6%) and MATH (50.7%) benchmarks. The dataset and models are released under a permissive license to promote reproducibility and further research in mathematical reasoning. The paper also includes detailed experimental setups, ablation studies, and error analyses to validate the effectiveness of the proposed approach.The paper introduces OpenMathInstruct-1, a large-scale math instruction tuning dataset containing 1.8 million problem-solution pairs. This dataset is constructed by synthesizing code-interpreter solutions for the GSM8K and MATH benchmarks using the Mixtral model, a permissively licensed open-source LLM. The authors address the gap between the mathematical skills of closed-source LLMs like GPT-4 and open-source LLMs by proposing novel prompting techniques and brute-force scaling. The best model, OpenMath-CodeLlama-70B, trained on a subset of OpenMathInstruct-1, achieves competitive performance with gpt-distilled models on both GSM8K (84.6%) and MATH (50.7%) benchmarks. The dataset and models are released under a permissive license to promote reproducibility and further research in mathematical reasoning. The paper also includes detailed experimental setups, ablation studies, and error analyses to validate the effectiveness of the proposed approach.