8 Mar 2022 | Wenlong Huang, Pieter Abbeel, Deepak Pathak*, Igor Mordatch*
This paper explores whether large language models (LLMs) can be used to generate actionable plans for embodied agents in interactive environments. The authors investigate the possibility of grounding high-level tasks, such as "make breakfast," into a sequence of actionable steps like "open fridge." They find that pre-trained LMs, when prompted appropriately, can effectively decompose high-level tasks into mid-level plans without additional training. However, these plans often lack precision and are not directly executable in the environment. To address this, the authors propose a procedure that conditions on existing demonstrations and semantically translates the plans into admissible actions. The evaluation in the VirtualHome environment shows that this method significantly improves the executability of the generated plans compared to the baseline LLMs. Human evaluations reveal a trade-off between executability and correctness, indicating promising progress in extracting actionable knowledge from language models. The paper contributes by demonstrating that LLMs can generate plausible action plans and proposing techniques to improve their executability without invasive modifications.This paper explores whether large language models (LLMs) can be used to generate actionable plans for embodied agents in interactive environments. The authors investigate the possibility of grounding high-level tasks, such as "make breakfast," into a sequence of actionable steps like "open fridge." They find that pre-trained LMs, when prompted appropriately, can effectively decompose high-level tasks into mid-level plans without additional training. However, these plans often lack precision and are not directly executable in the environment. To address this, the authors propose a procedure that conditions on existing demonstrations and semantically translates the plans into admissible actions. The evaluation in the VirtualHome environment shows that this method significantly improves the executability of the generated plans compared to the baseline LLMs. Human evaluations reveal a trade-off between executability and correctness, indicating promising progress in extracting actionable knowledge from language models. The paper contributes by demonstrating that LLMs can generate plausible action plans and proposing techniques to improve their executability without invasive modifications.