MUSTARD: MASTERING UNIFORM SYNTHESIS OF THEOREM AND PROOF DATA

MUSTARD: MASTERING UNIFORM SYNTHESIS OF THEOREM AND PROOF DATA

23 May 2024 | Yinya Huang, Xiaohan Lin, Zhengying Liu, Qingxing Cao, Huajian Xin, Haiming Wang, Zhenguo Li, Linqi Song, Xiaodan Liang
MUSTARD is a data generation framework that synthesizes high-quality and diverse theorem and proof data for mathematical reasoning. The framework consists of three stages: (1) sampling mathematical concepts as problem categories, (2) prompting a generative language model to generate problems and their step-wise formal solutions, and (3) filtering valid proofs using a theorem prover. The framework generates a benchmark called MUSTARDSAUCE with 5,866 valid data points, each containing an informal statement, informal proof, and a translated formal proof that passes the prover validation. The data is analyzed and used to fine-tune smaller language models, achieving significant performance gains in automated theorem proving and math word problems. The framework also demonstrates the effectiveness of the generated data in improving language models' mathematical reasoning capabilities. The contributions include a novel framework for generating high-quality mathematical data, a benchmark dataset with diverse and challenging problems, and extensive analysis showing the dataset's quality and effectiveness. The framework leverages the strengths of generative language models and formal theorem provers to produce large-scale, high-quality mathematical data. The results show that the generated data improves the performance of language models in mathematical reasoning tasks.MUSTARD is a data generation framework that synthesizes high-quality and diverse theorem and proof data for mathematical reasoning. The framework consists of three stages: (1) sampling mathematical concepts as problem categories, (2) prompting a generative language model to generate problems and their step-wise formal solutions, and (3) filtering valid proofs using a theorem prover. The framework generates a benchmark called MUSTARDSAUCE with 5,866 valid data points, each containing an informal statement, informal proof, and a translated formal proof that passes the prover validation. The data is analyzed and used to fine-tune smaller language models, achieving significant performance gains in automated theorem proving and math word problems. The framework also demonstrates the effectiveness of the generated data in improving language models' mathematical reasoning capabilities. The contributions include a novel framework for generating high-quality mathematical data, a benchmark dataset with diverse and challenging problems, and extensive analysis showing the dataset's quality and effectiveness. The framework leverages the strengths of generative language models and formal theorem provers to produce large-scale, high-quality mathematical data. The results show that the generated data improves the performance of language models in mathematical reasoning tasks.
Reach us at info@study.space
[slides and audio] MUSTARD%3A Mastering Uniform Synthesis of Theorem and Proof Data