23 May 2024 | Huajian Xin, Daya Guo, Zhihong Shao, Z.Z. Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, Xiaodan Liang
This paper introduces DeepSeek-Prover, a method to generate extensive synthetic proof data for formal theorem proving in large language models (LLMs). The approach involves translating natural language mathematical problems into formal statements, filtering out low-quality ones, and generating proofs to create synthetic data. After fine-tuning the DeepSeekMath 7B model on this synthetic dataset, which comprises 8 million formal statements with proofs, the model achieves whole-proof generation accuracies of 46.3% with 64 samples and 52% cumulatively on the Lean 4 miniF2F test, surpassing the baseline GPT-4 at 23.0% and a tree search reinforcement learning method at 41.0%. Additionally, the model successfully proved 5 out of 148 problems in the Lean 4 Formalized International Mathematical Olympiad (FIMO) benchmark, while GPT-4 failed to prove any. These results demonstrate the potential of leveraging large-scale synthetic data to enhance theorem-proving capabilities in LLMs. The synthetic dataset and model are made available to facilitate further research in this promising field. The paper also presents an iterative approach to synthesize formal proofs from informal math problems, involving autoformalization, quality filtering, statement proving, and iterative enhancement. The method effectively generates high-quality formal statements, filters out low-quality ones, and proves them using a formal verifier. The results show that the model performs strongly on benchmark datasets, achieving high accuracy in theorem proving. The paper also includes case studies demonstrating the model's ability to verify proofs and identify hypothesis inconsistencies. The broader impact of this research is to advance automated theorem proving by leveraging large-scale synthetic proof data generated from informal mathematical problems, enhancing the capabilities of large language models in formal theorem proving.This paper introduces DeepSeek-Prover, a method to generate extensive synthetic proof data for formal theorem proving in large language models (LLMs). The approach involves translating natural language mathematical problems into formal statements, filtering out low-quality ones, and generating proofs to create synthetic data. After fine-tuning the DeepSeekMath 7B model on this synthetic dataset, which comprises 8 million formal statements with proofs, the model achieves whole-proof generation accuracies of 46.3% with 64 samples and 52% cumulatively on the Lean 4 miniF2F test, surpassing the baseline GPT-4 at 23.0% and a tree search reinforcement learning method at 41.0%. Additionally, the model successfully proved 5 out of 148 problems in the Lean 4 Formalized International Mathematical Olympiad (FIMO) benchmark, while GPT-4 failed to prove any. These results demonstrate the potential of leveraging large-scale synthetic data to enhance theorem-proving capabilities in LLMs. The synthetic dataset and model are made available to facilitate further research in this promising field. The paper also presents an iterative approach to synthesize formal proofs from informal math problems, involving autoformalization, quality filtering, statement proving, and iterative enhancement. The method effectively generates high-quality formal statements, filters out low-quality ones, and proves them using a formal verifier. The results show that the model performs strongly on benchmark datasets, achieving high accuracy in theorem proving. The paper also includes case studies demonstrating the model's ability to verify proofs and identify hypothesis inconsistencies. The broader impact of this research is to advance automated theorem proving by leveraging large-scale synthetic proof data generated from informal mathematical problems, enhancing the capabilities of large language models in formal theorem proving.