30 Mar 2024 | Ben Zhou, Hongming Zhang, Sihao Chen, Dian Yu, Hongwei Wang, Baolin Peng, Dan Roth, Dong Yu
This paper introduces a novel conceptualization framework to evaluate and improve the conceptual reasoning ability of large language models (LLMs). The framework forces models to perform abstract reasoning on questions by replacing specific nouns with their semantic types and generating solutions in a symbolic space. Using this framework, the authors show that existing LLMs perform significantly worse on abstract reasoning tasks compared to direct inference methods, with performance drops ranging from 9% to 28%. They propose two techniques to improve conceptual reasoning: generating similar questions with familiar nouns and using them for self-refinement. Experiments show that these techniques improve LLMs' conceptual reasoning performance by 8% to 11%, achieving a more robust and less biased reasoning system.
The conceptualization framework consists of two parts: a question abstraction process that removes induction signals by replacing specific nouns with semantic types, and a symbolic program space where LLMs generate abstract reasoning solutions. The framework is tested on various reasoning benchmarks, showing that models struggle with tasks requiring complex reasoning and planning. The authors argue that high-level abstract reasoning is key to unbiased and generalizable decision-making, and that current LLMs lack this capability.
To improve conceptual reasoning, the authors propose using similar questions with familiar nouns as a source of trustworthy induction signals. These questions are used to select better candidate programs and to refine existing programs based on CoT solutions. The framework is shown to be effective in improving LLMs' reasoning performance, achieving results comparable to CoT in some scenarios while being more robust and less biased.
The paper also discusses related works, including decomposition-based reasoning and program-based inference methods. It highlights the importance of symbolic deduction over probabilistic induction and the need for models to perform reasoning without relying on inductive biases. The authors conclude that their proposed framework and techniques are effective in improving LLMs' conceptual reasoning ability, suggesting that future research should focus on generalizable and unbiased reasoning and planning with minimal reliance on induction.This paper introduces a novel conceptualization framework to evaluate and improve the conceptual reasoning ability of large language models (LLMs). The framework forces models to perform abstract reasoning on questions by replacing specific nouns with their semantic types and generating solutions in a symbolic space. Using this framework, the authors show that existing LLMs perform significantly worse on abstract reasoning tasks compared to direct inference methods, with performance drops ranging from 9% to 28%. They propose two techniques to improve conceptual reasoning: generating similar questions with familiar nouns and using them for self-refinement. Experiments show that these techniques improve LLMs' conceptual reasoning performance by 8% to 11%, achieving a more robust and less biased reasoning system.
The conceptualization framework consists of two parts: a question abstraction process that removes induction signals by replacing specific nouns with semantic types, and a symbolic program space where LLMs generate abstract reasoning solutions. The framework is tested on various reasoning benchmarks, showing that models struggle with tasks requiring complex reasoning and planning. The authors argue that high-level abstract reasoning is key to unbiased and generalizable decision-making, and that current LLMs lack this capability.
To improve conceptual reasoning, the authors propose using similar questions with familiar nouns as a source of trustworthy induction signals. These questions are used to select better candidate programs and to refine existing programs based on CoT solutions. The framework is shown to be effective in improving LLMs' reasoning performance, achieving results comparable to CoT in some scenarios while being more robust and less biased.
The paper also discusses related works, including decomposition-based reasoning and program-based inference methods. It highlights the importance of symbolic deduction over probabilistic induction and the need for models to perform reasoning without relying on inductive biases. The authors conclude that their proposed framework and techniques are effective in improving LLMs' conceptual reasoning ability, suggesting that future research should focus on generalizable and unbiased reasoning and planning with minimal reliance on induction.