Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs

Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs

21 Jun 2024 | Siyuan Wang, Zhongyu Wei, Yejin Choi, Xiang Ren
This paper investigates the ability of large language models (LLMs) to reason with logical inferential rules and proposes a framework called Logic Scaffolding Inferential Rule Generation (LOIRE) to construct an inferential rule base, ULogic, containing 8,000 primitive rules and over 6,000 compositional rules across five domains. The study reveals that while LLMs perform well in basic reasoning tasks, they struggle with complex and structurally intricate rules, showing biases in certain types of reasoning. The research also introduces a smaller-scale inference engine derived from ULogic, which is effective in generating accurate, complex, and abstract conclusions and premises, and enhances downstream reasoning tasks. The analysis of GPT-series models, including GPT-4, GPT-3.5-Turbo, and GPT-3.5-Turbo-Instruct, shows that they have a basic understanding of inferential rules but fall short of human proficiency, especially in rules with complex premises. The study highlights the need for improvements in LLMs' logical reasoning capabilities, particularly in handling complex and symbolic rules. The work provides a valuable resource for assessing LLMs' proficiency in underlying logic and enhancing flexible rule generation and reasoning. The inference engine developed in this study outperforms GPT-3.5-Turbo in all aspects and even surpasses GPT-4 in generating more complex and abstract rules. The research also explores the effectiveness of the inference engine in enhancing downstream reasoning tasks, demonstrating its potential in improving commonsense reasoning. The study underscores the importance of logical reasoning in AI and suggests future directions for enhancing LLMs' reasoning abilities.This paper investigates the ability of large language models (LLMs) to reason with logical inferential rules and proposes a framework called Logic Scaffolding Inferential Rule Generation (LOIRE) to construct an inferential rule base, ULogic, containing 8,000 primitive rules and over 6,000 compositional rules across five domains. The study reveals that while LLMs perform well in basic reasoning tasks, they struggle with complex and structurally intricate rules, showing biases in certain types of reasoning. The research also introduces a smaller-scale inference engine derived from ULogic, which is effective in generating accurate, complex, and abstract conclusions and premises, and enhances downstream reasoning tasks. The analysis of GPT-series models, including GPT-4, GPT-3.5-Turbo, and GPT-3.5-Turbo-Instruct, shows that they have a basic understanding of inferential rules but fall short of human proficiency, especially in rules with complex premises. The study highlights the need for improvements in LLMs' logical reasoning capabilities, particularly in handling complex and symbolic rules. The work provides a valuable resource for assessing LLMs' proficiency in underlying logic and enhancing flexible rule generation and reasoning. The inference engine developed in this study outperforms GPT-3.5-Turbo in all aspects and even surpasses GPT-4 in generating more complex and abstract rules. The research also explores the effectiveness of the inference engine in enhancing downstream reasoning tasks, demonstrating its potential in improving commonsense reasoning. The study underscores the importance of logical reasoning in AI and suggests future directions for enhancing LLMs' reasoning abilities.
Reach us at info@study.space
[slides and audio] Can LLMs Reason with Rules%3F Logic Scaffolding for Stress-Testing and Improving LLMs