LeanReasoner: Boosting Complex Logical Reasoning with Lean

LeanReasoner: Boosting Complex Logical Reasoning with Lean

20 Mar 2024 | Dongwei Jiang, Marcio Fonseca, Shay B. Cohen
LeanReasoner is a framework that leverages the theorem prover Lean to enhance logical reasoning capabilities of large language models (LLMs). The framework formalizes logical reasoning problems into theorems within Lean, enabling the verification of these theorems through symbolic solving. This approach reduces logical inconsistencies and improves the ability to handle complex reasoning tasks. LeanReasoner achieves state-of-the-art performance on the FOLIO dataset and near-state-of-the-art performance on ProofWriter, using fewer than 100 in-domain samples for each dataset. The framework consists of four main components: a formalizer, a tactic generator, a proof search mechanism, and a result interpreter. The formalizer converts natural language context into formalized context using LLMs like GPT-3 and GPT-4. The tactic generator creates tactics based on premises extracted from the formalized context. The proof search mechanism oversees tactic execution and goal expansion, while the result interpreter analyzes the output of the proof search to identify the correct answer. LeanReasoner outperforms other baselines in logical reasoning tasks, particularly in the FOLIO dataset. It achieves near-perfect accuracy on ProofWriter with significantly less data, demonstrating the effectiveness of pretraining on theorem-proving data. The framework also shows improved performance when using concise annotations for premise selection. However, LeanReasoner faces challenges in handling problems involving commonsense and factual reasoning, as well as in complex reasoning datasets like TheoremQA. The framework is also less efficient in solving symbolic problems that involve constraints and variable possibilities, where a Constraint Satisfaction Problem (CSP) solver might be more effective. The integration of theorem-proving frameworks with LLMs has the potential to advance logical reasoning. However, ethical considerations regarding data biases and the reliability of conclusions derived from LLMs must be addressed. The study highlights the importance of rigorous checks and unbiased reasoning in the development of logical reasoning systems.LeanReasoner is a framework that leverages the theorem prover Lean to enhance logical reasoning capabilities of large language models (LLMs). The framework formalizes logical reasoning problems into theorems within Lean, enabling the verification of these theorems through symbolic solving. This approach reduces logical inconsistencies and improves the ability to handle complex reasoning tasks. LeanReasoner achieves state-of-the-art performance on the FOLIO dataset and near-state-of-the-art performance on ProofWriter, using fewer than 100 in-domain samples for each dataset. The framework consists of four main components: a formalizer, a tactic generator, a proof search mechanism, and a result interpreter. The formalizer converts natural language context into formalized context using LLMs like GPT-3 and GPT-4. The tactic generator creates tactics based on premises extracted from the formalized context. The proof search mechanism oversees tactic execution and goal expansion, while the result interpreter analyzes the output of the proof search to identify the correct answer. LeanReasoner outperforms other baselines in logical reasoning tasks, particularly in the FOLIO dataset. It achieves near-perfect accuracy on ProofWriter with significantly less data, demonstrating the effectiveness of pretraining on theorem-proving data. The framework also shows improved performance when using concise annotations for premise selection. However, LeanReasoner faces challenges in handling problems involving commonsense and factual reasoning, as well as in complex reasoning datasets like TheoremQA. The framework is also less efficient in solving symbolic problems that involve constraints and variable possibilities, where a Constraint Satisfaction Problem (CSP) solver might be more effective. The integration of theorem-proving frameworks with LLMs has the potential to advance logical reasoning. However, ethical considerations regarding data biases and the reliability of conclusions derived from LLMs must be addressed. The study highlights the importance of rigorous checks and unbiased reasoning in the development of logical reasoning systems.
Reach us at info@study.space