Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion

Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion

21 May 2024 | Tairan He, Chong Zhang, Wenli Xiao, Guanqi He, Changliu Liu, Guanya Shi
Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion This paper introduces Agile But Safe (ABS), a learning-based control framework that enables agile and collision-free locomotion for quadrupedal robots. ABS involves an agile policy to execute agile motor skills amidst obstacles and a recovery policy to prevent failures, collaboratively achieving high-speed and collision-free navigation. The policy switch in ABS is governed by a learned control-theoretic reach-avoid value network, which also guides the recovery policy as an objective function, thereby safeguarding the robot in a closed loop. The training process involves the learning of the agile policy, the reach-avoid value network, the recovery policy, and an exteroception representation network, all in simulation. These trained modules can be directly deployed in the real world with onboard sensing and computation, leading to high-speed and collision-free navigation in confined indoor and outdoor spaces with both static and dynamic obstacles. ABS features a dual-policy setup where an agile policy and a recovery policy collaborate for high-speed collision-free locomotion, and the RA values govern the policy switch and guide the recovery policy. The agile policy enables the robot to run fast amidst obstacles, while the recovery policy saves the robot from risky cases where the agile policy might fail. The RA value network is trained by a discounted RA Bellman equation, with data collected by the learned agile policy in simulation. The RA value network also provides gradient information to guide the recovery policy, thus closing the loop. To achieve collision avoidance behaviors that can generalize in different scenarios, a low-dimensional exteroceptive feature is used for policy and RA value training: the traveling distances of several rays cast from the robot to obstacles. An exteroception representation network is trained with simulated data to map depth images to ray distances. This enables robust collision avoidance in high-speed locomotion with onboard sensing and computation. The contributions of this work include: 1) a perceptive agile policy for obstacle avoidance in high-speed locomotion with novel training methods; 2) a novel control-theoretic data-driven method for RA value estimation conditioned on the learned agile policy; 3) a dual-policy setup where an agile policy and a recovery policy collaborate for high-speed collision-free locomotion, and the RA values govern the policy switch and guide the recovery policy; 4) an exteroception representation network that predicts low-dimensional obstacle information for generalizable collision avoidance capability; 5) validation of ABS’s superior safety and state-of-the-art agility amidst obstacles indoors and outdoors.Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion This paper introduces Agile But Safe (ABS), a learning-based control framework that enables agile and collision-free locomotion for quadrupedal robots. ABS involves an agile policy to execute agile motor skills amidst obstacles and a recovery policy to prevent failures, collaboratively achieving high-speed and collision-free navigation. The policy switch in ABS is governed by a learned control-theoretic reach-avoid value network, which also guides the recovery policy as an objective function, thereby safeguarding the robot in a closed loop. The training process involves the learning of the agile policy, the reach-avoid value network, the recovery policy, and an exteroception representation network, all in simulation. These trained modules can be directly deployed in the real world with onboard sensing and computation, leading to high-speed and collision-free navigation in confined indoor and outdoor spaces with both static and dynamic obstacles. ABS features a dual-policy setup where an agile policy and a recovery policy collaborate for high-speed collision-free locomotion, and the RA values govern the policy switch and guide the recovery policy. The agile policy enables the robot to run fast amidst obstacles, while the recovery policy saves the robot from risky cases where the agile policy might fail. The RA value network is trained by a discounted RA Bellman equation, with data collected by the learned agile policy in simulation. The RA value network also provides gradient information to guide the recovery policy, thus closing the loop. To achieve collision avoidance behaviors that can generalize in different scenarios, a low-dimensional exteroceptive feature is used for policy and RA value training: the traveling distances of several rays cast from the robot to obstacles. An exteroception representation network is trained with simulated data to map depth images to ray distances. This enables robust collision avoidance in high-speed locomotion with onboard sensing and computation. The contributions of this work include: 1) a perceptive agile policy for obstacle avoidance in high-speed locomotion with novel training methods; 2) a novel control-theoretic data-driven method for RA value estimation conditioned on the learned agile policy; 3) a dual-policy setup where an agile policy and a recovery policy collaborate for high-speed collision-free locomotion, and the RA values govern the policy switch and guide the recovery policy; 4) an exteroception representation network that predicts low-dimensional obstacle information for generalizable collision avoidance capability; 5) validation of ABS’s superior safety and state-of-the-art agility amidst obstacles indoors and outdoors.
Reach us at info@study.space