MARIO: MATH Reasoning with code Interpreter Output – A Reproducible Pipeline

MARIO: MATH Reasoning with code Interpreter Output – A Reproducible Pipeline

21 Feb 2024 | Minpeng Liao*, Wei Luo*, Chengxi Li*, Jing Wu*, Kai Fan*
This paper introduces MARIO, a reproducible pipeline for mathematical reasoning with large language models (LLMs), focusing on integrating text analysis and code execution. The authors propose a novel math dataset enriched with Python code interpreters, derived from GSM8K and MATH, and refined through GPT-4 annotations, human review, and self-training. They also present a protocol for fine-tuning math-specific LLMs, leading to improved performance on GSM8K and MATH datasets. The dataset includes both text analyses and code snippets, enabling precise mathematical reasoning. The authors also introduce a value model (OVM) for evaluating solution outcomes, which enhances model performance by incorporating both correct and incorrect solutions. The MARIO dataset is publicly available, and the authors provide a detailed description of their data generation, fine-tuning, and inference processes. The paper also discusses the challenges of mathematical reasoning, including the limitations of code-centric approaches and the importance of common sense in problem-solving. The authors demonstrate that their approach significantly improves performance on both in-domain and out-of-domain datasets, and they highlight the importance of human review in ensuring the accuracy of solutions. The paper concludes with a discussion of related works and future research directions.This paper introduces MARIO, a reproducible pipeline for mathematical reasoning with large language models (LLMs), focusing on integrating text analysis and code execution. The authors propose a novel math dataset enriched with Python code interpreters, derived from GSM8K and MATH, and refined through GPT-4 annotations, human review, and self-training. They also present a protocol for fine-tuning math-specific LLMs, leading to improved performance on GSM8K and MATH datasets. The dataset includes both text analyses and code snippets, enabling precise mathematical reasoning. The authors also introduce a value model (OVM) for evaluating solution outcomes, which enhances model performance by incorporating both correct and incorrect solutions. The MARIO dataset is publicly available, and the authors provide a detailed description of their data generation, fine-tuning, and inference processes. The paper also discusses the challenges of mathematical reasoning, including the limitations of code-centric approaches and the importance of common sense in problem-solving. The authors demonstrate that their approach significantly improves performance on both in-domain and out-of-domain datasets, and they highlight the importance of human review in ensuring the accuracy of solutions. The paper concludes with a discussion of related works and future research directions.
Reach us at info@study.space
[slides] MARIO%3A MAth Reasoning with code Interpreter Output - A Reproducible Pipeline | StudySpace