8 Nov 2021 | Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt
The paper introduces MATH, a new dataset of 12,500 challenging competition mathematics problems designed to measure the mathematical problem-solving ability of machine learning models. Each problem includes a full step-by-step solution, which can be used to teach models to generate answer derivations and explanations. To facilitate future research and improve accuracy on MATH, the authors also contribute a large auxiliary pretraining dataset called AMPS, which contains over 100,000 Khan Academy problems and 5 million problems generated using Mathematica scripts. Despite these efforts, the authors find that accuracy remains relatively low, even with large Transformer models. They conclude that simply increasing model size and budget is not sufficient to achieve strong mathematical reasoning and that new algorithmic advancements are needed. The paper also discusses the challenges of using step-by-step solutions and the benefits of providing partial solutions during training. Overall, the study highlights the significant gap between current models and human-level performance in mathematical problem-solving tasks.The paper introduces MATH, a new dataset of 12,500 challenging competition mathematics problems designed to measure the mathematical problem-solving ability of machine learning models. Each problem includes a full step-by-step solution, which can be used to teach models to generate answer derivations and explanations. To facilitate future research and improve accuracy on MATH, the authors also contribute a large auxiliary pretraining dataset called AMPS, which contains over 100,000 Khan Academy problems and 5 million problems generated using Mathematica scripts. Despite these efforts, the authors find that accuracy remains relatively low, even with large Transformer models. They conclude that simply increasing model size and budget is not sufficient to achieve strong mathematical reasoning and that new algorithmic advancements are needed. The paper also discusses the challenges of using step-by-step solutions and the benefits of providing partial solutions during training. Overall, the study highlights the significant gap between current models and human-level performance in mathematical problem-solving tasks.