2024 | Fangru Lin, Emanuele La Malfa, Valentin Hofmann, Elle Michelle Yang, Anthony G. Cohn, Janet B. Pierrehumbert
This paper explores the capabilities of large language models (LLMs) in asynchronous plan reasoning, a task that requires both sequential and parallel planning to optimize time costs. The study finds that LLMs, including GPT-4 and LLaMA-2, perform poorly without detailed illustrations of the task-solving process. To address this, the authors propose a novel technique called *Plan Like a Graph* (PLaG), which combines graphs with natural language prompts. PLaG significantly improves model performance, but even with PLaG, LLMs still struggle with increasing task complexity, highlighting the limitations of using LLMs as digital devices. The paper also introduces a benchmark called Asyn cHow, which is used to evaluate LLMs' performance. The main contributions include the development of the AsyncHow benchmark, the proposal of PLaG, and the demonstration of the limits of LLMs in complex planning tasks. The code and data for the benchmark are available online.This paper explores the capabilities of large language models (LLMs) in asynchronous plan reasoning, a task that requires both sequential and parallel planning to optimize time costs. The study finds that LLMs, including GPT-4 and LLaMA-2, perform poorly without detailed illustrations of the task-solving process. To address this, the authors propose a novel technique called *Plan Like a Graph* (PLaG), which combines graphs with natural language prompts. PLaG significantly improves model performance, but even with PLaG, LLMs still struggle with increasing task complexity, highlighting the limitations of using LLMs as digital devices. The paper also introduces a benchmark called Asyn cHow, which is used to evaluate LLMs' performance. The main contributions include the development of the AsyncHow benchmark, the proposal of PLaG, and the demonstration of the limits of LLMs in complex planning tasks. The code and data for the benchmark are available online.