2024 | Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Verma, Kaya Stechly, Siddhant Bhambri, Lucas Saldyt, Anil Murthy
The paper argues that large language models (LLMs) cannot perform planning or self-verification on their own and highlights the misunderstandings in the literature. It proposes the LLM-Modulo Framework, which combines LLMs with external model-based verifiers to enhance planning and reasoning tasks. The framework leverages LLMs as approximate knowledge sources and candidate plan generators, while external critics ensure formal correctness. The authors review literature showing LLMs' limitations in generating executable plans and verifying plans, emphasizing the need for external verification. They propose a Generate-Test-Critique loop where LLMs generate candidate plans, which are then critiqued by external critics. The framework allows LLMs to play multiple roles, including generating plans, translating plans into syntactic forms, and helping users refine specifications. The paper discusses the design choices, critics/verifiers, and the role of humans in the framework. Case studies in classical planning and travel planning demonstrate the effectiveness of the LLM-Modulo framework. The authors conclude by emphasizing the importance of the framework in leveraging LLMs for robust planning and reasoning tasks.The paper argues that large language models (LLMs) cannot perform planning or self-verification on their own and highlights the misunderstandings in the literature. It proposes the LLM-Modulo Framework, which combines LLMs with external model-based verifiers to enhance planning and reasoning tasks. The framework leverages LLMs as approximate knowledge sources and candidate plan generators, while external critics ensure formal correctness. The authors review literature showing LLMs' limitations in generating executable plans and verifying plans, emphasizing the need for external verification. They propose a Generate-Test-Critique loop where LLMs generate candidate plans, which are then critiqued by external critics. The framework allows LLMs to play multiple roles, including generating plans, translating plans into syntactic forms, and helping users refine specifications. The paper discusses the design choices, critics/verifiers, and the role of humans in the framework. Case studies in classical planning and travel planning demonstrate the effectiveness of the LLM-Modulo framework. The authors conclude by emphasizing the importance of the framework in leveraging LLMs for robust planning and reasoning tasks.