7 Jun 2024 | Huaiyuan Ying12*, Zijian Wu13*, Yihan Geng14*, Jiayu Wang1, Dahua Lin1, Kai Chen1
The paper introduces a novel pipeline for translating natural language mathematical problems into formal Lean 4 statements and vice versa, addressing the challenge of data sparsity in formal theorem proving. The authors propose an iterative autoformalization pipeline that generates and filters synthetic data, using active learning to improve the model's performance. The pipeline involves collecting and translating problems from math contest forums, verifying their correctness using Lean 4, and back-translating to natural language for human evaluation. After multiple iterations, the final dataset contains 57K formal-informal question pairs and 21 new IMO questions. The authors open-source their code and dataset, aiming to enhance the ability of large language models (LLMs) in autoformalization and automatic theorem proving. The evaluation metrics include compile pass number (CPN), NLI pass number (NPN), and correct translation rate, showing significant improvements over initial models. The paper also discusses common translation errors and provides examples of translated problems from various mathematical areas, including algebra, number theory, and combinatorics.The paper introduces a novel pipeline for translating natural language mathematical problems into formal Lean 4 statements and vice versa, addressing the challenge of data sparsity in formal theorem proving. The authors propose an iterative autoformalization pipeline that generates and filters synthetic data, using active learning to improve the model's performance. The pipeline involves collecting and translating problems from math contest forums, verifying their correctness using Lean 4, and back-translating to natural language for human evaluation. After multiple iterations, the final dataset contains 57K formal-informal question pairs and 21 new IMO questions. The authors open-source their code and dataset, aiming to enhance the ability of large language models (LLMs) in autoformalization and automatic theorem proving. The evaluation metrics include compile pass number (CPN), NLI pass number (NPN), and correct translation rate, showing significant improvements over initial models. The paper also discusses common translation errors and provides examples of translated problems from various mathematical areas, including algebra, number theory, and combinatorics.