The paper "TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts" addresses the challenge of using Large Language Models (LLMs) to prove mathematical theorems in formal languages like Lean. The authors propose TheoremLlama, an end-to-end framework that trains a general-purpose LLM to become a Lean4 expert. The framework includes methods for generating NL-FL aligned datasets, training LLMs as formal theorem provers, and techniques for iterative proof writing. Key innovations include the NL-FL bootstrapping method, which integrates natural language reasoning into Lean4 code, and block training and curriculum data sorting techniques to enhance LLM performance. TheoremLlama achieves significant improvements over existing baselines, achieving 36.48% and 33.61% accuracy on the MiniF2F-Valid and Test datasets, respectively. The authors also open-source the OBT dataset and model checkpoints, facilitating further research in formal theorem proving.The paper "TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts" addresses the challenge of using Large Language Models (LLMs) to prove mathematical theorems in formal languages like Lean. The authors propose TheoremLlama, an end-to-end framework that trains a general-purpose LLM to become a Lean4 expert. The framework includes methods for generating NL-FL aligned datasets, training LLMs as formal theorem provers, and techniques for iterative proof writing. Key innovations include the NL-FL bootstrapping method, which integrates natural language reasoning into Lean4 code, and block training and curriculum data sorting techniques to enhance LLM performance. TheoremLlama achieves significant improvements over existing baselines, achieving 36.48% and 33.61% accuracy on the MiniF2F-Valid and Test datasets, respectively. The authors also open-source the OBT dataset and model checkpoints, facilitating further research in formal theorem proving.