2 Nov 2024 | Minghuan Liu*, Zixuan Chen*, Xuxin Cheng, Yandong Ji, Rizhao Qiu, Ruihan Yang, Xiaolong Wang
This paper presents a visual whole-body control (VBC) framework for legged loco-manipulation, enabling a quadruped robot to grasp objects of varying heights and configurations in both simulation and real-world environments. The framework consists of a low-level control policy that tracks body velocities and end-effector positions, and a high-level task-planning policy that generates velocity and position commands based on visual inputs. The policies are trained in simulation and transferred to real robots without requiring real-world data collection or fine-tuning. The system uses a Unitree B1 quadruped robot with a Unitree Z1 robotic arm and two cameras for perception. The framework demonstrates superior performance in picking up diverse objects in various configurations and environments, achieving high success rates with emergent retrying behaviors and generalization ability. The system is autonomous and vision-based, allowing operation both indoors and outdoors without external constraints. The VBC framework outperforms baselines in terms of flexibility and adaptability, particularly in handling objects at different heights. The system is trained using reinforcement learning and imitation learning, with a privileged teacher policy that provides immediate goals and guides the low-level policy to accomplish tasks. The visuomotor student policy is distilled from the teacher policy using DAgger. The framework shows strong performance in both simulation and real-world experiments, demonstrating the effectiveness of visual whole-body control for legged loco-manipulation.This paper presents a visual whole-body control (VBC) framework for legged loco-manipulation, enabling a quadruped robot to grasp objects of varying heights and configurations in both simulation and real-world environments. The framework consists of a low-level control policy that tracks body velocities and end-effector positions, and a high-level task-planning policy that generates velocity and position commands based on visual inputs. The policies are trained in simulation and transferred to real robots without requiring real-world data collection or fine-tuning. The system uses a Unitree B1 quadruped robot with a Unitree Z1 robotic arm and two cameras for perception. The framework demonstrates superior performance in picking up diverse objects in various configurations and environments, achieving high success rates with emergent retrying behaviors and generalization ability. The system is autonomous and vision-based, allowing operation both indoors and outdoors without external constraints. The VBC framework outperforms baselines in terms of flexibility and adaptability, particularly in handling objects at different heights. The system is trained using reinforcement learning and imitation learning, with a privileged teacher policy that provides immediate goals and guides the low-level policy to accomplish tasks. The visuomotor student policy is distilled from the teacher policy using DAgger. The framework shows strong performance in both simulation and real-world experiments, demonstrating the effectiveness of visual whole-body control for legged loco-manipulation.