23 May 2024 | Huajian Xin, Daya Guo, Zhihong Shao, Z.Z. Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, Xiaodan Liang
The paper "DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data" introduces a method to generate extensive synthetic proof data from high-school and undergraduate-level mathematical competition problems. The approach involves translating natural language problems into formal statements, filtering out low-quality statements, and generating proofs to create synthetic data. The DeepSeekMath 7B model is fine-tuned on this synthetic dataset, which includes 8 million formal statements with proofs. The model achieves whole-proof generation accuracies of 46.3% with 64 samples and 52% cumulatively on the Lean 4 miniP2F test, surpassing GPT-4 and a tree search reinforcement learning method. Additionally, the model successfully proved 5 out of 148 problems in the Lean 4 Formalized International Mathematical Olympiad (FIMO) benchmark, while GPT-4 failed to prove any. The paper also contributes to the community by creating and open-sourcing a large dataset of high-quality formal mathematical proofs, fostering further research in automated theorem proving.The paper "DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data" introduces a method to generate extensive synthetic proof data from high-school and undergraduate-level mathematical competition problems. The approach involves translating natural language problems into formal statements, filtering out low-quality statements, and generating proofs to create synthetic data. The DeepSeekMath 7B model is fine-tuned on this synthetic dataset, which includes 8 million formal statements with proofs. The model achieves whole-proof generation accuracies of 46.3% with 64 samples and 52% cumulatively on the Lean 4 miniP2F test, surpassing GPT-4 and a tree search reinforcement learning method. Additionally, the model successfully proved 5 out of 148 problems in the Lean 4 Formalized International Mathematical Olympiad (FIMO) benchmark, while GPT-4 failed to prove any. The paper also contributes to the community by creating and open-sourcing a large dataset of high-quality formal mathematical proofs, fostering further research in automated theorem proving.