Large Language Models (LLMs) have captured public imagination due to their impressive linguistic capabilities, but their ability to reason and plan remains questionable. LLMs are essentially n-gram models trained on vast amounts of web-scale data, functioning as giant non-veridical memories. They excel at approximate retrieval, reconstructing text completions probabilistically, but cannot guarantee memorizing complete answers. This ability, while enabling "creativity," also leads to "hallucination."
Despite this, many researchers claim LLMs can perform planning and reasoning tasks. However, experiments show that LLMs like GPT3, GPT3.5, and GPT4 perform poorly on planning tasks, with GPT4 achieving only 30% accuracy in the Blocks World domain. This performance is likely due to approximate retrieval rather than true planning. When the names of actions and objects in planning problems are obfuscated, GPT4's performance drops significantly, indicating that it relies on approximate retrieval rather than actual planning.
LLMs can be nudged into better planning through techniques like fine-tuning or prompting with hints. However, these methods often rely on memory-based retrieval rather than true reasoning. A more effective approach is to use an external model-based plan verifier, which can critique and refine LLM-generated plans. This "LLM-Modulo" framework leverages LLMs for idea generation while relying on external verifiers for accuracy.
While LLMs may not be capable of autonomous reasoning or planning, they can still be valuable in solving planning tasks when combined with external verifiers or expert humans. Their ability to generate potential solutions can be useful in "LLM-Modulo" frameworks, where the actual planning is handled by sound frameworks with correctness guarantees.
Many papers claiming LLMs can plan are misleading, as they often confuse general planning knowledge with executable plans. LLMs can extract planning knowledge but lack the ability to generate correct plans without verification. This is illustrated by the proliferation of travel planning books generated by LLMs, which often fail to meet expectations.
In conclusion, LLMs are not capable of true reasoning or planning, but they can be valuable tools in "LLM-Modulo" frameworks when combined with external verifiers. Their ability to generate ideas can support reasoning and planning tasks, but they should not be credited with autonomous reasoning capabilities.Large Language Models (LLMs) have captured public imagination due to their impressive linguistic capabilities, but their ability to reason and plan remains questionable. LLMs are essentially n-gram models trained on vast amounts of web-scale data, functioning as giant non-veridical memories. They excel at approximate retrieval, reconstructing text completions probabilistically, but cannot guarantee memorizing complete answers. This ability, while enabling "creativity," also leads to "hallucination."
Despite this, many researchers claim LLMs can perform planning and reasoning tasks. However, experiments show that LLMs like GPT3, GPT3.5, and GPT4 perform poorly on planning tasks, with GPT4 achieving only 30% accuracy in the Blocks World domain. This performance is likely due to approximate retrieval rather than true planning. When the names of actions and objects in planning problems are obfuscated, GPT4's performance drops significantly, indicating that it relies on approximate retrieval rather than actual planning.
LLMs can be nudged into better planning through techniques like fine-tuning or prompting with hints. However, these methods often rely on memory-based retrieval rather than true reasoning. A more effective approach is to use an external model-based plan verifier, which can critique and refine LLM-generated plans. This "LLM-Modulo" framework leverages LLMs for idea generation while relying on external verifiers for accuracy.
While LLMs may not be capable of autonomous reasoning or planning, they can still be valuable in solving planning tasks when combined with external verifiers or expert humans. Their ability to generate potential solutions can be useful in "LLM-Modulo" frameworks, where the actual planning is handled by sound frameworks with correctness guarantees.
Many papers claiming LLMs can plan are misleading, as they often confuse general planning knowledge with executable plans. LLMs can extract planning knowledge but lack the ability to generate correct plans without verification. This is illustrated by the proliferation of travel planning books generated by LLMs, which often fail to meet expectations.
In conclusion, LLMs are not capable of true reasoning or planning, but they can be valuable tools in "LLM-Modulo" frameworks when combined with external verifiers. Their ability to generate ideas can support reasoning and planning tasks, but they should not be credited with autonomous reasoning capabilities.