TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

3 Jul 2024 | Ruida Wang, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, Tong Zhang
TheoremLlama is an end-to-end framework that transforms a general-purpose large language model (LLM) into a Lean4 expert. The framework includes methods for generating NL-FL aligned datasets, training LLMs for formal theorem proving, and iterative proof writing. The key innovation is the NL-FL bootstrapping method, which integrates informal proofs into Lean4 code to enhance the LLM's ability to perform formal reasoning. The framework achieves cumulative accuracies of 36.48% and 33.61% on the MiniF2F-Valid and Test datasets, surpassing the GPT-4 baseline. The OBT dataset, containing 106,852 NL-FL aligned theorems, is provided for training. The framework addresses the challenge of data scarcity in formal theorem proving by generating aligned datasets and improving LLM training techniques. The results demonstrate the effectiveness of TheoremLlama in generating formal proofs using natural language guidance. The framework is open-sourced, and all code will be made publicly available.TheoremLlama is an end-to-end framework that transforms a general-purpose large language model (LLM) into a Lean4 expert. The framework includes methods for generating NL-FL aligned datasets, training LLMs for formal theorem proving, and iterative proof writing. The key innovation is the NL-FL bootstrapping method, which integrates informal proofs into Lean4 code to enhance the LLM's ability to perform formal reasoning. The framework achieves cumulative accuracies of 36.48% and 33.61% on the MiniF2F-Valid and Test datasets, surpassing the GPT-4 baseline. The OBT dataset, containing 106,852 NL-FL aligned theorems, is provided for training. The framework addresses the challenge of data scarcity in formal theorem proving by generating aligned datasets and improving LLM training techniques. The results demonstrate the effectiveness of TheoremLlama in generating formal proofs using natural language guidance. The framework is open-sourced, and all code will be made publicly available.
Reach us at info@study.space