General Flow as Foundation Affordance for Scalable Robot Learning

General Flow as Foundation Affordance for Scalable Robot Learning

23 Sep 2024 | Chengbo Yuan, Chuan Wen, Tong Zhang, Yang Gao
This paper introduces General Flow as a foundational affordance for scalable robot learning. The authors propose a scalable framework for acquiring real-world manipulation skills by leveraging 3D flow as a prediction target, which represents the future trajectories of 3D points on objects of interest. They develop a language-conditioned 3D flow prediction model directly from large-scale RGBD human video datasets, enabling zero-shot skill transfer in real-world scenarios. The proposed method achieves an impressive 81% success rate in zero-shot human-to-robot skill transfer across 18 tasks in 6 scenes, demonstrating its scalability, wide application, and stable skill transfer. The framework features three key benefits: (1) scalability by leveraging cross-embodiment data resources; (2) wide application across multiple object categories, including rigid, articulated, and soft bodies; and (3) stable skill transfer by providing actionable guidance with a small inference domain gap. The model is trained on RGBD human video datasets and uses a heuristic policy based on closed-loop flow prediction. The authors also conduct extensive experiments and analysis to validate the effectiveness of their approach, demonstrating its robustness to segmentation errors, generalization to novel shapes, and adaptability to diverse scenes and directions. The results show that the proposed method achieves superior performance on various metrics, including 3D Average Displacement Error (ADE) and Final Displacement Error (FDE). The paper also discusses related work and highlights the potential of general flow in advancing scalable general robot learning.This paper introduces General Flow as a foundational affordance for scalable robot learning. The authors propose a scalable framework for acquiring real-world manipulation skills by leveraging 3D flow as a prediction target, which represents the future trajectories of 3D points on objects of interest. They develop a language-conditioned 3D flow prediction model directly from large-scale RGBD human video datasets, enabling zero-shot skill transfer in real-world scenarios. The proposed method achieves an impressive 81% success rate in zero-shot human-to-robot skill transfer across 18 tasks in 6 scenes, demonstrating its scalability, wide application, and stable skill transfer. The framework features three key benefits: (1) scalability by leveraging cross-embodiment data resources; (2) wide application across multiple object categories, including rigid, articulated, and soft bodies; and (3) stable skill transfer by providing actionable guidance with a small inference domain gap. The model is trained on RGBD human video datasets and uses a heuristic policy based on closed-loop flow prediction. The authors also conduct extensive experiments and analysis to validate the effectiveness of their approach, demonstrating its robustness to segmentation errors, generalization to novel shapes, and adaptability to diverse scenes and directions. The results show that the proposed method achieves superior performance on various metrics, including 3D Average Displacement Error (ADE) and Final Displacement Error (FDE). The paper also discusses related work and highlights the potential of general flow in advancing scalable general robot learning.
Reach us at info@study.space