RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation

RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation

2024 | Hanxiao Jiang, Binghao Huang, Ruihai Wu, Zhuoran Li, Shubham Garg, Hooshang Nayyeri, Shenlong Wang, Yunzhu Li
RoboEXP is a novel robotic exploration system that enables robots to autonomously explore environments and generate action-conditioned scene graphs (ACSG) to represent the structure of the environment. The ACSG captures both low-level information (geometry and semantics) and high-level information (action-conditioned relationships between entities). The system integrates a Large Multimodal Model (LMM) and an explicit memory design to enhance its capabilities. The robot reasons about what and how to explore objects, accumulating information through interaction and incrementally building the ACSG. The system is evaluated across various scenarios, demonstrating its effectiveness in real-world manipulation tasks involving rigid, articulated, nested, and deformable objects. RoboEXP handles diverse exploration tasks in a zero-shot manner, constructing complex ACSGs in various scenarios, including those with obstructing objects and requiring multi-step reasoning. The system is robust and adaptable, effectively managing human interventions and performing multiple complex downstream tasks. The ACSG advances LLM/LMM-guided robotic manipulation and decision-making research, extending their operation domain to complex environments with unknown or unobserved objects. This work introduces a new active exploration strategy for manipulation, uniquely defining a novel scene graph-guided objective to guide the exploration process. The system's action-conditioned 3D scene graph is crucial for guiding complex downstream manipulation tasks. The system is evaluated in various scenarios, showing superior performance compared to a strong baseline. The system's ability to explore and manipulate objects in complex environments is demonstrated through experiments on tabletop and mobile robot scenarios. The system's performance is measured through metrics such as success rate, object recovery, state recovery, unexplored space, and graph edit distance. The system's results show that it outperforms the baseline in all metrics, demonstrating its effectiveness in interactive scene exploration. The system's ability to adapt to environmental changes and handle human interventions is also highlighted. The system's limitations include detection and segmentation errors in the perception module, which can be addressed by improving visual foundation models and enhancing LMM capacities. The system's conclusion is that it provides a foundation for practical robotic deployment in complex settings, facilitating integration into everyday human activities.RoboEXP is a novel robotic exploration system that enables robots to autonomously explore environments and generate action-conditioned scene graphs (ACSG) to represent the structure of the environment. The ACSG captures both low-level information (geometry and semantics) and high-level information (action-conditioned relationships between entities). The system integrates a Large Multimodal Model (LMM) and an explicit memory design to enhance its capabilities. The robot reasons about what and how to explore objects, accumulating information through interaction and incrementally building the ACSG. The system is evaluated across various scenarios, demonstrating its effectiveness in real-world manipulation tasks involving rigid, articulated, nested, and deformable objects. RoboEXP handles diverse exploration tasks in a zero-shot manner, constructing complex ACSGs in various scenarios, including those with obstructing objects and requiring multi-step reasoning. The system is robust and adaptable, effectively managing human interventions and performing multiple complex downstream tasks. The ACSG advances LLM/LMM-guided robotic manipulation and decision-making research, extending their operation domain to complex environments with unknown or unobserved objects. This work introduces a new active exploration strategy for manipulation, uniquely defining a novel scene graph-guided objective to guide the exploration process. The system's action-conditioned 3D scene graph is crucial for guiding complex downstream manipulation tasks. The system is evaluated in various scenarios, showing superior performance compared to a strong baseline. The system's ability to explore and manipulate objects in complex environments is demonstrated through experiments on tabletop and mobile robot scenarios. The system's performance is measured through metrics such as success rate, object recovery, state recovery, unexplored space, and graph edit distance. The system's results show that it outperforms the baseline in all metrics, demonstrating its effectiveness in interactive scene exploration. The system's ability to adapt to environmental changes and handle human interventions is also highlighted. The system's limitations include detection and segmentation errors in the perception module, which can be addressed by improving visual foundation models and enhancing LMM capacities. The system's conclusion is that it provides a foundation for practical robotic deployment in complex settings, facilitating integration into everyday human activities.
Reach us at info@study.space
Understanding RoboEXP%3A Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation