20 Mar 2024 | Dongwei Jiang, Marcio Fonseca, Shay B. Cohen
**LeanReasoner: Boosting Complex Logical Reasoning with Lean**
**Dongwei Jiang, Marcio Fonseca, Shay B. Cohen**
**Johns Hopkins University, University of Edinburgh**
**Abstract:**
Large language models (LLMs) often struggle with complex logical reasoning due to logical inconsistencies and the inherent difficulty of such reasoning. We use Lean, a theorem proving framework, to address these challenges. By formalizing logical reasoning problems into theorems within Lean, we can solve them by proving or disproving the corresponding theorems. This method reduces the risk of logical inconsistencies with the help of Lean’s symbolic solver. It also enhances our ability to treat complex reasoning tasks by using Lean’s extensive library of theorem proofs. Our method achieves state-of-the-art performance on the FOLIO dataset and achieves performance near this level on ProofWriter. Notably, these results were accomplished by fine-tuning on fewer than 100 in-domain samples for each dataset.
**Introduction:**
Logical reasoning, a core capability of humans, has been a challenging issue for machine learning systems. LLMs, despite their impressive abilities in natural language understanding and generation, often fall short when dealing with complex logical reasoning tasks. They frequently suffer from logical inconsistencies, leading to spurious results. Recent advances in AI have adopted a structured approach to tackle these reasoning problems by splitting them into symbolic formalization and problem-solving. Symbolic solvers act as a rigorous checkpoint, ensuring that the model outputs align with logical rules.
**LeanReasoner:**
Our framework, LeanReasoner, is composed of four main components: a formalizer, a tactic generator, a proof search mechanism, and a result interpreter. The formalizer converts the context and question into formalized context and formalized questions. The tactic generator generates tactics based on premises extracted from the formalized context. The proof search mechanism oversees tactic execution and goal expansion. The result interpreter analyzes the output of the proof search and identifies the correct answer.
**Experimental Setup:**
We used two common logical reasoning datasets: ProofWriter and FOLIO. For model training, we collected 100 theorem proofs for ProofWriter and 27 theorem proofs for FOLIO. During data annotation, we adopted two divergent approaches: a detailed procedure with all intermediate steps and lemmas, and a more succinct approach similar to mathlib annotations.
**Results:**
Our approach yields near-perfect accuracy on ProofWriter with significantly less data. For FOLIO, our approach achieves state-of-the-art performance. We also compared our results with other baselines, highlighting the strengths of our model.
**Conclusion:**
We introduced LeanReasoner, a framework based on Lean that augments the logical reasoning abilities of LLMs. Our method effectively uses the logical "nuggets" found in mathematical theorem proofs, enhancing the overall accuracy of logical reasoning tasks.**LeanReasoner: Boosting Complex Logical Reasoning with Lean**
**Dongwei Jiang, Marcio Fonseca, Shay B. Cohen**
**Johns Hopkins University, University of Edinburgh**
**Abstract:**
Large language models (LLMs) often struggle with complex logical reasoning due to logical inconsistencies and the inherent difficulty of such reasoning. We use Lean, a theorem proving framework, to address these challenges. By formalizing logical reasoning problems into theorems within Lean, we can solve them by proving or disproving the corresponding theorems. This method reduces the risk of logical inconsistencies with the help of Lean’s symbolic solver. It also enhances our ability to treat complex reasoning tasks by using Lean’s extensive library of theorem proofs. Our method achieves state-of-the-art performance on the FOLIO dataset and achieves performance near this level on ProofWriter. Notably, these results were accomplished by fine-tuning on fewer than 100 in-domain samples for each dataset.
**Introduction:**
Logical reasoning, a core capability of humans, has been a challenging issue for machine learning systems. LLMs, despite their impressive abilities in natural language understanding and generation, often fall short when dealing with complex logical reasoning tasks. They frequently suffer from logical inconsistencies, leading to spurious results. Recent advances in AI have adopted a structured approach to tackle these reasoning problems by splitting them into symbolic formalization and problem-solving. Symbolic solvers act as a rigorous checkpoint, ensuring that the model outputs align with logical rules.
**LeanReasoner:**
Our framework, LeanReasoner, is composed of four main components: a formalizer, a tactic generator, a proof search mechanism, and a result interpreter. The formalizer converts the context and question into formalized context and formalized questions. The tactic generator generates tactics based on premises extracted from the formalized context. The proof search mechanism oversees tactic execution and goal expansion. The result interpreter analyzes the output of the proof search and identifies the correct answer.
**Experimental Setup:**
We used two common logical reasoning datasets: ProofWriter and FOLIO. For model training, we collected 100 theorem proofs for ProofWriter and 27 theorem proofs for FOLIO. During data annotation, we adopted two divergent approaches: a detailed procedure with all intermediate steps and lemmas, and a more succinct approach similar to mathlib annotations.
**Results:**
Our approach yields near-perfect accuracy on ProofWriter with significantly less data. For FOLIO, our approach achieves state-of-the-art performance. We also compared our results with other baselines, highlighting the strengths of our model.
**Conclusion:**
We introduced LeanReasoner, a framework based on Lean that augments the logical reasoning abilities of LLMs. Our method effectively uses the logical "nuggets" found in mathematical theorem proofs, enhancing the overall accuracy of logical reasoning tasks.