Understanding Object-Centric Instruction Augmentation for Robotic Manipulation

The paper introduces the Object-Centric Instruction Augmentation (OCI) framework, which enhances language instructions for robotic manipulation by incorporating object positions. The OCI framework leverages a Multi-modal Large Language Model (MLLM) to integrate visual information into natural language instructions, aiding the policy network in understanding object locations and improving manipulation performance. The authors propose a feature reuse mechanism to efficiently integrate MLLM features into policy networks, reducing computational costs. Experiments on both simulated and real-world robotic tasks demonstrate that OCI outperforms traditional language instructions, highlighting the importance of positional cues in enhancing policy learning and manipulation success rates. The project is available at https://oci-robotics.github.io/.The paper introduces the Object-Centric Instruction Augmentation (OCI) framework, which enhances language instructions for robotic manipulation by incorporating object positions. The OCI framework leverages a Multi-modal Large Language Model (MLLM) to integrate visual information into natural language instructions, aiding the policy network in understanding object locations and improving manipulation performance. The authors propose a feature reuse mechanism to efficiently integrate MLLM features into policy networks, reducing computational costs. Experiments on both simulated and real-world robotic tasks demonstrate that OCI outperforms traditional language instructions, highlighting the importance of positional cues in enhancing policy learning and manipulation success rates. The project is available at https://oci-robotics.github.io/.

Object-Centric Instruction Augmentation for Robotic Manipulation

1 Feb 2024 | Junjie Wen, Yichen Zhu, Minjie Zhu, Jinming Li, Zhiyuan Xu, Zhengping Che, Chaomin Shen, Yixin Peng, Dong Liu, Feifei Feng, and Jian Tang