CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation Models

CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation Models

2024 | Haoxu Huang2,3,4*, Fanqi Lin1,2,4*, Yingdong Hu1,2,4, Shengjie Wang1,2,4, Yang Gao1,2,4
CoPa is a novel framework that leverages foundation models to generate a sequence of 6-DoF end-effector poses for robotic manipulation tasks. The framework decomposes the manipulation process into two phases: task-oriented grasping and task-aware motion planning. In the grasping phase, a coarse-to-fine grounding mechanism is used to select the appropriate grasping part of an object. In the motion planning phase, VLMs are utilized to identify spatial geometry constraints of task-relevant object parts, which are then used to derive post-grasp poses. CoPa demonstrates a fine-grained physical understanding of scenes, enabling it to handle open-set instructions and objects with minimal prompt engineering and without additional training. Extensive real-world experiments show that CoPa outperforms baselines in completing complex manipulation tasks, showcasing its capability to generalize to open-world scenarios. The framework can also be seamlessly integrated with high-level planning methods to accomplish long-horizon tasks.CoPa is a novel framework that leverages foundation models to generate a sequence of 6-DoF end-effector poses for robotic manipulation tasks. The framework decomposes the manipulation process into two phases: task-oriented grasping and task-aware motion planning. In the grasping phase, a coarse-to-fine grounding mechanism is used to select the appropriate grasping part of an object. In the motion planning phase, VLMs are utilized to identify spatial geometry constraints of task-relevant object parts, which are then used to derive post-grasp poses. CoPa demonstrates a fine-grained physical understanding of scenes, enabling it to handle open-set instructions and objects with minimal prompt engineering and without additional training. Extensive real-world experiments show that CoPa outperforms baselines in completing complex manipulation tasks, showcasing its capability to generalize to open-world scenarios. The framework can also be seamlessly integrated with high-level planning methods to accomplish long-horizon tasks.
Reach us at info@study.space