4 Oct 2024 | Mengda Xu, Zhenjia Xu, Yinghao Xu, Cheng Chi, Gordon Wetzstein, Manuela Veloso, Shuran Song
Im2Flow2Act is a scalable learning framework that enables robots to acquire real-world manipulation skills without requiring real-world robot training data. The key idea is to use object flow as a manipulation interface to bridge domain gaps between different embodiments (human and robot) and training environments (real-world and simulated). The framework consists of two components: a flow generation network and a flow-conditioned policy. The flow generation network, trained on human demonstration videos, generates object flow from the initial scene image, conditioned on the task description. The flow-conditioned policy, trained on simulated robot play data, maps the generated object flow to robot actions to realize the desired object movements. By using flow as input, this policy can be directly deployed in the real world with a minimal sim-to-real gap. By leveraging real-world human videos and simulated robot play data, the system bypasses the challenges of teleoperating physical robots in the real world, resulting in a scalable system for diverse tasks. The framework demonstrates capabilities in a variety of real-world tasks, including the manipulation of rigid, articulated, and deformable objects. The system achieves an average success rate of 81% across four real-world tasks, including those involving rigid, articulated, and deformable objects without any real-world robot data for training. The framework outperforms baselines in both simulation and real-world settings, showing the effectiveness of object flow as a unifying interface for cross-domain and cross-embodiment learning. The system is robust to different embodiments during robot deployment and is capable of learning from unstructured exploration data through an alignment module. The framework also highlights the necessity of a learning-based policy for translating flow to accurate and safe actions, as heuristic-based policies struggle with tasks involving deformable and articulated objects. The system's design allows for efficient training and deployment, making it a promising approach for scalable robotic manipulation.Im2Flow2Act is a scalable learning framework that enables robots to acquire real-world manipulation skills without requiring real-world robot training data. The key idea is to use object flow as a manipulation interface to bridge domain gaps between different embodiments (human and robot) and training environments (real-world and simulated). The framework consists of two components: a flow generation network and a flow-conditioned policy. The flow generation network, trained on human demonstration videos, generates object flow from the initial scene image, conditioned on the task description. The flow-conditioned policy, trained on simulated robot play data, maps the generated object flow to robot actions to realize the desired object movements. By using flow as input, this policy can be directly deployed in the real world with a minimal sim-to-real gap. By leveraging real-world human videos and simulated robot play data, the system bypasses the challenges of teleoperating physical robots in the real world, resulting in a scalable system for diverse tasks. The framework demonstrates capabilities in a variety of real-world tasks, including the manipulation of rigid, articulated, and deformable objects. The system achieves an average success rate of 81% across four real-world tasks, including those involving rigid, articulated, and deformable objects without any real-world robot data for training. The framework outperforms baselines in both simulation and real-world settings, showing the effectiveness of object flow as a unifying interface for cross-domain and cross-embodiment learning. The system is robust to different embodiments during robot deployment and is capable of learning from unstructured exploration data through an alignment module. The framework also highlights the necessity of a learning-based policy for translating flow to accurate and safe actions, as heuristic-based policies struggle with tasks involving deformable and articulated objects. The system's design allows for efficient training and deployment, making it a promising approach for scalable robotic manipulation.