The paper "General Flow as Foundation Affordance for Scalable Robot Learning" addresses the challenge of acquiring real-world manipulation skills using a scalable framework. The authors propose to use 3D flow, which represents the future trajectories of 3D points on objects of interest, as an ideal prediction target. They develop a language-conditioned 3D flow prediction model directly from large-scale RGBD human video datasets, which offers actionable guidance for zero-shot skill transfer in real-world scenarios. The method achieves an impressive 81% success rate in zero-shot human-to-robot skill transfer across 18 tasks in 6 scenes, covering multiple object categories including rigid, articulated, and soft bodies. The framework is characterized by its scalability, wide application, and stable skill transfer, making it a significant step forward in scalable general robot learning. The paper also discusses the robustness of the system to various challenges, such as segmentation errors, novel shapes, grasp manners, and diverse scenes.The paper "General Flow as Foundation Affordance for Scalable Robot Learning" addresses the challenge of acquiring real-world manipulation skills using a scalable framework. The authors propose to use 3D flow, which represents the future trajectories of 3D points on objects of interest, as an ideal prediction target. They develop a language-conditioned 3D flow prediction model directly from large-scale RGBD human video datasets, which offers actionable guidance for zero-shot skill transfer in real-world scenarios. The method achieves an impressive 81% success rate in zero-shot human-to-robot skill transfer across 18 tasks in 6 scenes, covering multiple object categories including rigid, articulated, and soft bodies. The framework is characterized by its scalability, wide application, and stable skill transfer, making it a significant step forward in scalable general robot learning. The paper also discusses the robustness of the system to various challenges, such as segmentation errors, novel shapes, grasp manners, and diverse scenes.