MARIO: MAth Reasoning with code Interpreter Output – A Reproducible Pipeline

MARIO: MAth Reasoning with code Interpreter Output – A Reproducible Pipeline

21 Feb 2024 | Minpeng Liao*, Wei Luo*, Chengxi Li*, Jing Wu*, Kai Fan†
The paper "MARIO: MAth Reasoning with code Interpreter Output – A Reproducible Pipeline" addresses the gap in mathematical reasoning capabilities of large language models (LLMs). It introduces a novel math dataset that integrates text analysis and code snippets, derived from GSM8K and MATH datasets, and enhanced through GPT-4 annotations, human review, and self-training processes. The dataset aims to improve LLMs' performance in arithmetic computations and exact calculations. The authors propose a reproducible pipeline for fine-tuning math-specific LLMs, including Continual Pre-training (CPT), Supervised Fine-tuning (SFT), and Multi-Task OVM Fine-tuning. This pipeline significantly enhances the performance of a 7B-parameter LLM on the GSM8K and MATH datasets. The paper also includes a detailed description of the data generation process, fine-tuning methods, and experimental results, demonstrating the effectiveness of the proposed approach in both in-domain and out-of-domain math tasks. The source code and model checkpoints are made publicly available to facilitate further research and development in mathematical reasoning within LLMs.The paper "MARIO: MAth Reasoning with code Interpreter Output – A Reproducible Pipeline" addresses the gap in mathematical reasoning capabilities of large language models (LLMs). It introduces a novel math dataset that integrates text analysis and code snippets, derived from GSM8K and MATH datasets, and enhanced through GPT-4 annotations, human review, and self-training processes. The dataset aims to improve LLMs' performance in arithmetic computations and exact calculations. The authors propose a reproducible pipeline for fine-tuning math-specific LLMs, including Continual Pre-training (CPT), Supervised Fine-tuning (SFT), and Multi-Task OVM Fine-tuning. This pipeline significantly enhances the performance of a 7B-parameter LLM on the GSM8K and MATH datasets. The paper also includes a detailed description of the data generation process, fine-tuning methods, and experimental results, demonstrating the effectiveness of the proposed approach in both in-domain and out-of-domain math tasks. The source code and model checkpoints are made publicly available to facilitate further research and development in mathematical reasoning within LLMs.
Reach us at info@study.space