15 Jun 2024 | Ziwen Zhuang12, Shenzhe Yao12, Hang Zhao13
This paper presents a novel framework for learning an end-to-end vision-based whole-body-control parkour policy for humanoid robots. The system is designed to overcome multiple parkour skills, including jumping on platforms, leaping over hurdles, and navigating various terrains, without requiring any prior motion references. The authors use fractal noise in the terrain to encourage foot-raising, simplifying the reward function and enabling the policy to follow turning commands even on straight tracks. The training process involves two stages: first, a walking policy is trained to follow locomotion commands, and second, a parkour policy is trained using an auto-curriculum mechanism with 10 different types of terrains. The policy is then distilled using DAgger with 4-GPU acceleration for deployment on a real humanoid robot. The system demonstrates robust performance in both indoor and outdoor environments, showing that it can autonomously select appropriate parkour skills while following joystick commands. The paper also highlights the transferability of the framework to humanoid mobile manipulation tasks by overriding arm actions. The effectiveness of the method is validated through experiments, demonstrating superior performance compared to existing methods and highlighting the importance of onboard vision and multi-GPU acceleration in achieving efficient and effective parkour capabilities.This paper presents a novel framework for learning an end-to-end vision-based whole-body-control parkour policy for humanoid robots. The system is designed to overcome multiple parkour skills, including jumping on platforms, leaping over hurdles, and navigating various terrains, without requiring any prior motion references. The authors use fractal noise in the terrain to encourage foot-raising, simplifying the reward function and enabling the policy to follow turning commands even on straight tracks. The training process involves two stages: first, a walking policy is trained to follow locomotion commands, and second, a parkour policy is trained using an auto-curriculum mechanism with 10 different types of terrains. The policy is then distilled using DAgger with 4-GPU acceleration for deployment on a real humanoid robot. The system demonstrates robust performance in both indoor and outdoor environments, showing that it can autonomously select appropriate parkour skills while following joystick commands. The paper also highlights the transferability of the framework to humanoid mobile manipulation tasks by overriding arm actions. The effectiveness of the method is validated through experiments, demonstrating superior performance compared to existing methods and highlighting the importance of onboard vision and multi-GPU acceleration in achieving efficient and effective parkour capabilities.