[slides] Can large language models reason and plan%3F

The article by Subbarao Kambhampati explores the capabilities of Large Language Models (LLMs) in planning and reasoning tasks. LLMs, which are essentially advanced n-gram models trained on extensive language corpora, have gained attention for their unexpected linguistic behaviors. However, their primary function is not principled reasoning but rather approximate retrieval, where they probabilistically reconstruct completions for prompts word by word. Despite this, there has been a growing interest in evaluating LLMs' ability to perform planning and reasoning tasks, often through observational studies and anecdotal claims. Kambhampati's research group conducted experiments with GPT3 and later GPT4 to assess these capabilities. They found that while there was some improvement in plan accuracy, it was not due to principled reasoning but rather to approximate retrieval from a larger training corpus. The group also tested the effectiveness of fine-tuning and back-prompting techniques, finding that these methods did not significantly improve planning performance. They emphasized that LLMs can generate ideas and potential solutions but require external verification to ensure correctness. The article concludes that LLMs are best viewed as approximate knowledge sources trained on web-scale data, capable of generating ideas and potential solutions that can be refined and verified by humans or external models. This approach, known as LLM-Modulo, leverages LLMs' approximate retrieval abilities to support reasoning and planning tasks, without ascribing autonomous reasoning capabilities to them.The article by Subbarao Kambhampati explores the capabilities of Large Language Models (LLMs) in planning and reasoning tasks. LLMs, which are essentially advanced n-gram models trained on extensive language corpora, have gained attention for their unexpected linguistic behaviors. However, their primary function is not principled reasoning but rather approximate retrieval, where they probabilistically reconstruct completions for prompts word by word. Despite this, there has been a growing interest in evaluating LLMs' ability to perform planning and reasoning tasks, often through observational studies and anecdotal claims. Kambhampati's research group conducted experiments with GPT3 and later GPT4 to assess these capabilities. They found that while there was some improvement in plan accuracy, it was not due to principled reasoning but rather to approximate retrieval from a larger training corpus. The group also tested the effectiveness of fine-tuning and back-prompting techniques, finding that these methods did not significantly improve planning performance. They emphasized that LLMs can generate ideas and potential solutions but require external verification to ensure correctness. The article concludes that LLMs are best viewed as approximate knowledge sources trained on web-scale data, capable of generating ideas and potential solutions that can be refined and verified by humans or external models. This approach, known as LLM-Modulo, leverages LLMs' approximate retrieval abilities to support reasoning and planning tasks, without ascribing autonomous reasoning capabilities to them.

Can Large Language Models Reason and Plan?

2024 | Subbarao Kambhampati