Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

8 Mar 2022 | Wenlong Huang, Pieter Abbeel, Deepak Pathak*, Igor Mordatch*
This paper explores whether large language models (LLMs) can be used to generate actionable knowledge for embodied agents in interactive environments. The authors investigate whether LLMs can decompose high-level tasks, such as "make breakfast," into a sequence of actionable steps, like "open fridge," without further training. They find that while LLMs can generate plausible action plans, these plans are often not executable in the environment due to semantic ambiguities or mismatches with admissible actions. To improve executability, the authors propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions. They evaluate their method in the VirtualHome environment, a simulated household environment that supports a variety of human activities. Their results show that the proposed method significantly improves the executability of LLM-generated action plans compared to the LLM baseline. However, the method also leads to a trade-off between executability and semantic correctness. The authors also investigate the effectiveness of different translation models and find that using a pre-trained translation model can improve the accuracy of action plans. They further analyze the impact of different model sizes and find that larger models tend to generate more expressive plans but are less executable. The study also highlights the importance of action translation in extracting actionable knowledge from LLMs. Overall, the paper demonstrates that LLMs can be used to generate actionable knowledge for embodied agents, but further research is needed to improve the balance between executability and correctness. The authors conclude that their method provides a promising approach for grounding LLMs in embodied environments, but more work is needed to achieve human-level performance.This paper explores whether large language models (LLMs) can be used to generate actionable knowledge for embodied agents in interactive environments. The authors investigate whether LLMs can decompose high-level tasks, such as "make breakfast," into a sequence of actionable steps, like "open fridge," without further training. They find that while LLMs can generate plausible action plans, these plans are often not executable in the environment due to semantic ambiguities or mismatches with admissible actions. To improve executability, the authors propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions. They evaluate their method in the VirtualHome environment, a simulated household environment that supports a variety of human activities. Their results show that the proposed method significantly improves the executability of LLM-generated action plans compared to the LLM baseline. However, the method also leads to a trade-off between executability and semantic correctness. The authors also investigate the effectiveness of different translation models and find that using a pre-trained translation model can improve the accuracy of action plans. They further analyze the impact of different model sizes and find that larger models tend to generate more expressive plans but are less executable. The study also highlights the importance of action translation in extracting actionable knowledge from LLMs. Overall, the paper demonstrates that LLMs can be used to generate actionable knowledge for embodied agents, but further research is needed to improve the balance between executability and correctness. The authors conclude that their method provides a promising approach for grounding LLMs in embodied environments, but more work is needed to achieve human-level performance.
Reach us at info@study.space
[slides and audio] Language Models as Zero-Shot Planners%3A Extracting Actionable Knowledge for Embodied Agents