[slides and audio] MUSTARD%3A Mastering Uniform Synthesis of Theorem and Proof Data

**MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data** This paper introduces MUSTARD, a novel framework for generating high-quality and diverse mathematical data, specifically designed for theorem proving and math word problems. MUSTARD combines the strengths of large language models (LLMs) and formal theorem provers to create a robust data generation process. The framework operates in three stages: 1. **Concept Seeding**: MUSTARD samples a set of mathematical concepts from a predefined pool, covering various educational levels and domains. 2. **Solution Generation**: LLMs are prompted to generate problems and their corresponding informal and formal solutions. The LLMs must handle generating the problem statement, solving it with a natural language proof, and translating the proof into a formalized form. 3. **Proof Filtering**: The generated solutions are validated using a theorem prover (e.g., Lean Prover). Invalid solutions are revised based on the prover's feedback, and valid solutions are collected. The resulting dataset, MUSTARDSAUCE, contains 5,866 valid data points, each with an informal statement, an informal proof, and a formal proof that passes the prover's validation. Extensive analysis and experiments demonstrate the quality, diversity, and effectiveness of MUSTARDSAUCE in improving the mathematical reasoning capabilities of language models. Fine-tuning smaller-scale language models on MUSTARDSAUCE results in significant performance gains, with the Llama 2-7B achieving a 15.41% average relative improvement in automated theorem proving and an 8.18% improvement in math word problems. The paper also includes a detailed discussion of the framework's components, experimental results, and future directions, highlighting the potential for further research and community contributions in this area.**MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data** This paper introduces MUSTARD, a novel framework for generating high-quality and diverse mathematical data, specifically designed for theorem proving and math word problems. MUSTARD combines the strengths of large language models (LLMs) and formal theorem provers to create a robust data generation process. The framework operates in three stages: 1. **Concept Seeding**: MUSTARD samples a set of mathematical concepts from a predefined pool, covering various educational levels and domains. 2. **Solution Generation**: LLMs are prompted to generate problems and their corresponding informal and formal solutions. The LLMs must handle generating the problem statement, solving it with a natural language proof, and translating the proof into a formalized form. 3. **Proof Filtering**: The generated solutions are validated using a theorem prover (e.g., Lean Prover). Invalid solutions are revised based on the prover's feedback, and valid solutions are collected. The resulting dataset, MUSTARDSAUCE, contains 5,866 valid data points, each with an informal statement, an informal proof, and a formal proof that passes the prover's validation. Extensive analysis and experiments demonstrate the quality, diversity, and effectiveness of MUSTARDSAUCE in improving the mathematical reasoning capabilities of language models. Fine-tuning smaller-scale language models on MUSTARDSAUCE results in significant performance gains, with the Llama 2-7B achieving a 15.41% average relative improvement in automated theorem proving and an 8.18% improvement in math word problems. The paper also includes a detailed discussion of the framework's components, experimental results, and future directions, highlighting the potential for further research and community contributions in this area.

MUSTARD: MASTERING UNIFORM SYNTHESIS OF THEOREM AND PROOF DATA

23 May 2024 | Yinya Huang, Xiaohan Lin, Zhengying Liu, Qingxing Cao, Huajian Xin, Haiming Wang, Zhengguo Li, Lingqi Song, Xiaodan Liang