Understanding Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation

This paper presents Human to Humanoid (H2O), a reinforcement learning (RL) based framework that enables real-time whole-body teleoperation of a full-sized humanoid robot using only an RGB camera. The system addresses the challenge of translating human motions into actions that a humanoid robot can perform. To create a large-scale retargeted motion dataset, the authors propose a scalable "sim-to-data" process to filter and select feasible motions using a privileged motion imitator. They then train a robust real-time humanoid motion imitator in simulation and transfer it to the real humanoid robot in a zero-shot manner. The system successfully achieves real-time whole-body teleoperation in real-world scenarios, including walking, back jumping, kicking, turning, waving, pushing, and boxing. This is the first demonstration of learning-based real-time whole-body humanoid teleoperation. The system involves three main components: motion retargeting, sim-to-real training, and real-time teleoperation. For motion retargeting, the authors use a two-step process to align the SMPL body model to the humanoid's structure and retarget motions to feasible ones. They then use a privileged motion imitator to filter out infeasible motions. For sim-to-real training, the authors use domain randomization to bridge the gap between simulation and reality. For real-time teleoperation, the system uses an RGB camera and 3D human pose estimation to capture human motions and mimic them in real-time. The system demonstrates the feasibility of an RL-based real-time Human-to-Humanoid (H2O) teleoperation system. The contributions include: 1) a scalable retargeting and "sim-to-data" process to obtain a large-scale motion dataset feasible for the real-world humanoid robot; 2) sim-to-real transfer of the RL-based whole-body tracking controller that scales to a large number of motions; and 3) a real-time teleoperation system with an RGB camera and 3D human pose estimation, demonstrating the fulfillment of various whole-body motions including walking, pick-and-place, stroller pushing, boxing, handwaving, ball kicking, etc.This paper presents Human to Humanoid (H2O), a reinforcement learning (RL) based framework that enables real-time whole-body teleoperation of a full-sized humanoid robot using only an RGB camera. The system addresses the challenge of translating human motions into actions that a humanoid robot can perform. To create a large-scale retargeted motion dataset, the authors propose a scalable "sim-to-data" process to filter and select feasible motions using a privileged motion imitator. They then train a robust real-time humanoid motion imitator in simulation and transfer it to the real humanoid robot in a zero-shot manner. The system successfully achieves real-time whole-body teleoperation in real-world scenarios, including walking, back jumping, kicking, turning, waving, pushing, and boxing. This is the first demonstration of learning-based real-time whole-body humanoid teleoperation. The system involves three main components: motion retargeting, sim-to-real training, and real-time teleoperation. For motion retargeting, the authors use a two-step process to align the SMPL body model to the humanoid's structure and retarget motions to feasible ones. They then use a privileged motion imitator to filter out infeasible motions. For sim-to-real training, the authors use domain randomization to bridge the gap between simulation and reality. For real-time teleoperation, the system uses an RGB camera and 3D human pose estimation to capture human motions and mimic them in real-time. The system demonstrates the feasibility of an RL-based real-time Human-to-Humanoid (H2O) teleoperation system. The contributions include: 1) a scalable retargeting and "sim-to-data" process to obtain a large-scale motion dataset feasible for the real-world humanoid robot; 2) sim-to-real transfer of the RL-based whole-body tracking controller that scales to a large number of motions; and 3) a real-time teleoperation system with an RGB camera and 3D human pose estimation, demonstrating the fulfillment of various whole-body motions including walking, pick-and-place, stroller pushing, boxing, handwaving, ball kicking, etc.

Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation

7 Mar 2024 | Tairan He, Zhengyi Luo, Wenli Xiao, Chong Zhang, Kris Kitani, Changliu Liu, Guanya Shi