[slides and audio] Towards Generalist Robot Learning from Internet Video%3A A Survey

This survey provides an overview of methods for learning from video (LfV) in the context of reinforcement learning (RL) and robotics, focusing on scalable approaches to large internet video datasets. The authors highlight the potential benefits of LfV, such as improved generalization beyond available robot data and enhanced data efficiency, and discuss key challenges like missing information in videos and distribution shifts between video and robot domains. The survey covers video foundation model techniques for extracting knowledge from large, heterogeneous video datasets, methods leveraging video data for robot learning, and techniques for mitigating LfV challenges. It also examines LfV datasets and benchmarks, and concludes with a discussion of future challenges and opportunities, advocating for scalable foundation model approaches that can leverage the full range of internet video data to target the learning of promising RL knowledge modalities: policies and dynamics models. The survey aims to serve as a comprehensive reference for the emerging field of LfV, fostering further research and progress towards the development of general-purpose robots.This survey provides an overview of methods for learning from video (LfV) in the context of reinforcement learning (RL) and robotics, focusing on scalable approaches to large internet video datasets. The authors highlight the potential benefits of LfV, such as improved generalization beyond available robot data and enhanced data efficiency, and discuss key challenges like missing information in videos and distribution shifts between video and robot domains. The survey covers video foundation model techniques for extracting knowledge from large, heterogeneous video datasets, methods leveraging video data for robot learning, and techniques for mitigating LfV challenges. It also examines LfV datasets and benchmarks, and concludes with a discussion of future challenges and opportunities, advocating for scalable foundation model approaches that can leverage the full range of internet video data to target the learning of promising RL knowledge modalities: policies and dynamics models. The survey aims to serve as a comprehensive reference for the emerging field of LfV, fostering further research and progress towards the development of general-purpose robots.

Towards Generalist Robot Learning from Internet Video: A Survey

7 Jun 2024 | Robert McCarthy, Daniel C.H. Tan, Dominik Schmidt, Fernando Acero, Nathan Herr, Yilun Du, Thomas G. Thuruthel, Zhibhin Li