AutoGPT+P: Affordance-based Task Planning using Large Language Models

AutoGPT+P: Affordance-based Task Planning using Large Language Models

23 Jul 2024 | Timo B"{u}r, Christoph Pohl, Abdelrahman Younes and Tamim Asfour
AutoGPT+P is a system that combines an affordance-based scene representation with a planning system to address the challenge of dynamically capturing the initial state of task planning problems. Affordances, which represent the action possibilities of an agent on the environment, allow symbolic planning with arbitrary objects. AutoGPT+P leverages this representation to derive and execute a plan for a task specified by the user in natural language. It can handle planning with incomplete information by exploring the scene, suggesting alternatives, or providing a partial plan. The system uses an Object Affordance Mapping (OAM) automatically generated using ChatGPT to combine object detection with affordance-based scene representation. The core planning tool extends existing work by automatically correcting semantic and syntactic errors, achieving a success rate of 98% on the SayCan instruction set. The system was evaluated on a dataset with 150 scenarios, achieving a success rate of 79%. AutoGPT+P is designed to handle tasks with missing objects by searching the environment for missing objects, proposing alternatives, or progressing towards a subgoal. It also allows robots to seek assistance from humans when needed. The system uses affordances to dynamically deduce viable actions within a given scene, facilitating the formation of a plan to achieve the user's objective. AutoGPT+P consists of two stages: scene perception and task planning. The first stage involves perceiving the environment as a set of objects and extracting scene affordances based on visual data. The second stage involves task planning based on the established affordance-based scene representation and the user's specified goal. AutoGPT+P utilizes an LLM to select tools that support generating a plan to accomplish the task. The system was evaluated in simulation using 180 scenarios and validated with a humanoid robot. The main contributions of this work include a novel affordance-based scene representation, a task planning approach based on an LLM-based tool selection, an extension of the LLM+P planning approach with automated error correction, and real-world validation experiments with a humanoid robot. The system outperforms existing approaches in handling tasks with missing objects and provides a more dynamic and flexible planning solution.AutoGPT+P is a system that combines an affordance-based scene representation with a planning system to address the challenge of dynamically capturing the initial state of task planning problems. Affordances, which represent the action possibilities of an agent on the environment, allow symbolic planning with arbitrary objects. AutoGPT+P leverages this representation to derive and execute a plan for a task specified by the user in natural language. It can handle planning with incomplete information by exploring the scene, suggesting alternatives, or providing a partial plan. The system uses an Object Affordance Mapping (OAM) automatically generated using ChatGPT to combine object detection with affordance-based scene representation. The core planning tool extends existing work by automatically correcting semantic and syntactic errors, achieving a success rate of 98% on the SayCan instruction set. The system was evaluated on a dataset with 150 scenarios, achieving a success rate of 79%. AutoGPT+P is designed to handle tasks with missing objects by searching the environment for missing objects, proposing alternatives, or progressing towards a subgoal. It also allows robots to seek assistance from humans when needed. The system uses affordances to dynamically deduce viable actions within a given scene, facilitating the formation of a plan to achieve the user's objective. AutoGPT+P consists of two stages: scene perception and task planning. The first stage involves perceiving the environment as a set of objects and extracting scene affordances based on visual data. The second stage involves task planning based on the established affordance-based scene representation and the user's specified goal. AutoGPT+P utilizes an LLM to select tools that support generating a plan to accomplish the task. The system was evaluated in simulation using 180 scenarios and validated with a humanoid robot. The main contributions of this work include a novel affordance-based scene representation, a task planning approach based on an LLM-based tool selection, an extension of the LLM+P planning approach with automated error correction, and real-world validation experiments with a humanoid robot. The system outperforms existing approaches in handling tasks with missing objects and provides a more dynamic and flexible planning solution.
Reach us at info@study.space
[slides] AutoGPT%2BP%3A Affordance-based Task Planning with Large Language Models | StudySpace