Understanding HumanPlus%3A Humanoid Shadowing and Imitation from Humans

HumanPlus is a full-stack system for humanoid robots to learn motion and autonomous skills from human data. The system enables robots to shadow fast, diverse motions from a human operator, including boxing and playing table tennis, and to learn autonomous skills like wearing a shoe, folding clothes, and jumping high. The system uses a low-level policy trained in simulation via reinforcement learning using existing 40-hour human motion datasets. This policy transfers to the real world and allows humanoid robots to follow human body and hand motion in real time using only a RGB camera, i.e., shadowing. Through shadowing, human operators can teleoperate humanoids to collect whole-body data for learning different tasks in the real world. Using the data collected, the system performs supervised behavior cloning to train skill policies using egocentric vision, allowing humanoids to complete different tasks autonomously by imitating human skills. The system is demonstrated on a customized 33-DoF 180cm humanoid, autonomously completing tasks such as wearing a shoe to stand up and walk, unloading objects from warehouse racks, folding a sweatshirt, rearranging objects, typing, and greeting another robot with 60-100% success rates using up to 40 demonstrations. The system includes a real-time shadowing system that allows human operators to whole-body control humanoids using a single RGB camera and a low-level policy trained on massive human motion data in simulation. It also includes an imitation learning algorithm that enables efficient learning from 40 demonstrations for binocular perception and high-DoF control. The synergy between the shadowing system and imitation learning algorithm allows learning of whole-body manipulation and locomotion skills directly in the real-world, such as wearing a shoe to stand up and walk, using only up to 40 demonstrations with 60-100% success. The system is compared with other teleoperation methods and shows higher success rates in imitation learning tasks. The system is robust to disturbances and enables more whole-body skills than the manufacturer controller. The system is tested on various tasks, including wear a shoe and walk, warehouse, fold clothes, rearrange objects, type "AI", and two-robot greeting, demonstrating the robot's ability in complex bimanual manipulation. The system is also evaluated for its robustness and shows that it can withstand larger forces compared to the manufacturer default controller. The system is supported by a full-stack approach, enabling the humanoid to learn complex autonomous skills from human data.HumanPlus is a full-stack system for humanoid robots to learn motion and autonomous skills from human data. The system enables robots to shadow fast, diverse motions from a human operator, including boxing and playing table tennis, and to learn autonomous skills like wearing a shoe, folding clothes, and jumping high. The system uses a low-level policy trained in simulation via reinforcement learning using existing 40-hour human motion datasets. This policy transfers to the real world and allows humanoid robots to follow human body and hand motion in real time using only a RGB camera, i.e., shadowing. Through shadowing, human operators can teleoperate humanoids to collect whole-body data for learning different tasks in the real world. Using the data collected, the system performs supervised behavior cloning to train skill policies using egocentric vision, allowing humanoids to complete different tasks autonomously by imitating human skills. The system is demonstrated on a customized 33-DoF 180cm humanoid, autonomously completing tasks such as wearing a shoe to stand up and walk, unloading objects from warehouse racks, folding a sweatshirt, rearranging objects, typing, and greeting another robot with 60-100% success rates using up to 40 demonstrations. The system includes a real-time shadowing system that allows human operators to whole-body control humanoids using a single RGB camera and a low-level policy trained on massive human motion data in simulation. It also includes an imitation learning algorithm that enables efficient learning from 40 demonstrations for binocular perception and high-DoF control. The synergy between the shadowing system and imitation learning algorithm allows learning of whole-body manipulation and locomotion skills directly in the real-world, such as wearing a shoe to stand up and walk, using only up to 40 demonstrations with 60-100% success. The system is compared with other teleoperation methods and shows higher success rates in imitation learning tasks. The system is robust to disturbances and enables more whole-body skills than the manufacturer controller. The system is tested on various tasks, including wear a shoe and walk, warehouse, fold clothes, rearrange objects, type "AI", and two-robot greeting, demonstrating the robot's ability in complex bimanual manipulation. The system is also evaluated for its robustness and shows that it can withstand larger forces compared to the manufacturer default controller. The system is supported by a full-stack approach, enabling the humanoid to learn complex autonomous skills from human data.

HumanPlus: Humanoid Shadowing and Imitation from Humans

June 2024 | Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, Chelsea Finn