Orca-Math: Unlocking the potential of SLMs in Grade School Math

Orca-Math: Unlocking the potential of SLMs in Grade School Math

16 Feb 2024 | Arindam Mitra; Hamed Khanpour, Corby Rosset, Ahmed Awadallah
Orca-Math is a 7-billion-parameter language model that achieves 86.81% accuracy on the GSM8K benchmark without requiring multiple model calls, external tools, or verifiers. It is trained on a synthetic dataset of 200,000 math problems created using a multi-agent setup, where agents collaborate to generate diverse and challenging problems. The model uses iterative learning techniques, including supervised fine-tuning and preference learning, to improve its performance. Orca-Math outperforms larger models like LLAMA-2-70B, WizardMath-70B, Gemini Pro, and ChatGPT-3.5, achieving high accuracy with significantly less data than other models. The dataset construction involves creating a diverse set of math problems using multiple agents, including the "Ask Me Anything" agent, which generates variations of existing problems, and the "Suggester & Editor" agents, which create more complex problems. The model is trained using a combination of positive and negative signals, with preference learning enhancing its performance. Orca-Math's performance is evaluated using the GPT4-based exact-match metric, and it achieves high accuracy on various math benchmarks, including AddSub, ASDiv, MultiArith, SingleOp, SinglEq, and Svamp. The model's success demonstrates the effectiveness of iterative learning and synthetic data generation in improving the performance of small language models in mathematical reasoning tasks.Orca-Math is a 7-billion-parameter language model that achieves 86.81% accuracy on the GSM8K benchmark without requiring multiple model calls, external tools, or verifiers. It is trained on a synthetic dataset of 200,000 math problems created using a multi-agent setup, where agents collaborate to generate diverse and challenging problems. The model uses iterative learning techniques, including supervised fine-tuning and preference learning, to improve its performance. Orca-Math outperforms larger models like LLAMA-2-70B, WizardMath-70B, Gemini Pro, and ChatGPT-3.5, achieving high accuracy with significantly less data than other models. The dataset construction involves creating a diverse set of math problems using multiple agents, including the "Ask Me Anything" agent, which generates variations of existing problems, and the "Suggester & Editor" agents, which create more complex problems. The model is trained using a combination of positive and negative signals, with preference learning enhancing its performance. Orca-Math's performance is evaluated using the GPT4-based exact-match metric, and it achieves high accuracy on various math benchmarks, including AddSub, ASDiv, MultiArith, SingleOp, SinglEq, and Svamp. The model's success demonstrates the effectiveness of iterative learning and synthetic data generation in improving the performance of small language models in mathematical reasoning tasks.
Reach us at info@study.space