June 2024 | Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, Chelsea Finn
The paper presents HumanPlus, a full-stack system designed to enable humanoid robots to learn motion and autonomous skills from human data. The system addresses the challenges of humanoid perception and control, physical differences between humanoids and humans, and the lack of a data pipeline for learning autonomous skills from egocentric vision. Key components include:
1. **Low-Level Policy Training**: A low-level policy is trained in simulation using reinforcement learning on a 40-hour human motion dataset (AMASS). This policy is then transferred to the real world, allowing the humanoid to shadow human operators in real-time using only an RGB camera.
2. **Shadowing System**: The system enables human operators to teleoperate the humanoid, collecting whole-body data for various tasks. This data is used to train skill policies using supervised behavior cloning.
3. **Skill Policies**: The Humanoid Imitation Transformer (HIT) is a decoder-only transformer that processes egocentric RGB vision and predicts desired body and hand poses. HIT incorporates forward dynamics prediction to enhance performance and prevent overfitting to proprioception.
4. **Task Demonstration**: The system demonstrates autonomous completion of tasks such as wearing a shoe, unloading objects, folding clothes, rearranging objects, typing, and greeting another robot with 60-100% success rates using up to 40 demonstrations.
5. **Hardware Details**: The humanoid has 33 degrees of freedom, including two 6-DoF hands and a 19-DoF body. It is equipped with two egocentric RGB cameras and can exert forces up to 10N and hold items up to 7.5kg.
6. **Evaluation**: The system is evaluated through experiments on shadowing and imitation tasks, showing superior performance compared to baselines in terms of task completion time, success rates, and robustness.
The main contributions of the paper are the HumanPlus system and the Humanoid Shadowing Transformer and Humanoid Imitation Transformer algorithms, which enable efficient learning and manipulation of complex skills in the real world.The paper presents HumanPlus, a full-stack system designed to enable humanoid robots to learn motion and autonomous skills from human data. The system addresses the challenges of humanoid perception and control, physical differences between humanoids and humans, and the lack of a data pipeline for learning autonomous skills from egocentric vision. Key components include:
1. **Low-Level Policy Training**: A low-level policy is trained in simulation using reinforcement learning on a 40-hour human motion dataset (AMASS). This policy is then transferred to the real world, allowing the humanoid to shadow human operators in real-time using only an RGB camera.
2. **Shadowing System**: The system enables human operators to teleoperate the humanoid, collecting whole-body data for various tasks. This data is used to train skill policies using supervised behavior cloning.
3. **Skill Policies**: The Humanoid Imitation Transformer (HIT) is a decoder-only transformer that processes egocentric RGB vision and predicts desired body and hand poses. HIT incorporates forward dynamics prediction to enhance performance and prevent overfitting to proprioception.
4. **Task Demonstration**: The system demonstrates autonomous completion of tasks such as wearing a shoe, unloading objects, folding clothes, rearranging objects, typing, and greeting another robot with 60-100% success rates using up to 40 demonstrations.
5. **Hardware Details**: The humanoid has 33 degrees of freedom, including two 6-DoF hands and a 19-DoF body. It is equipped with two egocentric RGB cameras and can exert forces up to 10N and hold items up to 7.5kg.
6. **Evaluation**: The system is evaluated through experiments on shadowing and imitation tasks, showing superior performance compared to baselines in terms of task completion time, success rates, and robustness.
The main contributions of the paper are the HumanPlus system and the Humanoid Shadowing Transformer and Humanoid Imitation Transformer algorithms, which enable efficient learning and manipulation of complex skills in the real world.