22 May 2024 | Toru Lin, Yu Zhang*, Qiyang Li*, Haozhi Qi*, Brent Yi, Sergey Levine, and Jitendra Malik
This paper presents a bimanual system with multifingered hands and visuotactile data for learning complex dexterous tasks. The system, called HATO, is a low-cost teleoperation setup that uses off-the-shelf hardware and software to collect visuotactile data for policy learning. The system includes two UR5 robot arms with Psyonic Ability Hands, which are repurposed prosthetic hands with tactile sensors. Visual data is collected via RGB-D cameras, and tactile data is obtained from the fingertips of the hands. The system uses a Meta Quest 2 platform for teleoperation, with grip buttons for power grasp control and thumbsticks for thumb control.
The system learns policies for four complex tasks: slippery object handover, block stacking, wine pouring, and steak serving. These tasks require bimanual dexterity, precise control, and touch feedback. The policies are trained using visuotactile data collected from the system, enabling the robot to perform long-horizon, high-precision tasks that are difficult to achieve without multifingered dexterity and touch feedback. The system also investigates the effects of dataset size, sensing modality, and visual input preprocessing on policy learning. The results show that vision and touch significantly enhance learning efficiency, policy success rate, and policy robustness. A dataset of a few hundred demonstrations is sufficient for learning effective bimanual dexterous policies.
The system demonstrates natural and human-like skills, showcasing unprecedented dexterity. The policies are trained using a diffusion policy approach, which allows for fast and smooth deployment. The system also includes an asynchronous inference algorithm, which enables parallel execution of the diffusion model prediction and robot execution. The policies are evaluated on four tasks, with success rates ranging from 50% to 100%. The system also shows that touch sensing is critical for the reliable completion of many challenging tasks. The system is open-sourced, with all hardware and software systems and the teleoperation dataset made publicly available for further research and collaboration.This paper presents a bimanual system with multifingered hands and visuotactile data for learning complex dexterous tasks. The system, called HATO, is a low-cost teleoperation setup that uses off-the-shelf hardware and software to collect visuotactile data for policy learning. The system includes two UR5 robot arms with Psyonic Ability Hands, which are repurposed prosthetic hands with tactile sensors. Visual data is collected via RGB-D cameras, and tactile data is obtained from the fingertips of the hands. The system uses a Meta Quest 2 platform for teleoperation, with grip buttons for power grasp control and thumbsticks for thumb control.
The system learns policies for four complex tasks: slippery object handover, block stacking, wine pouring, and steak serving. These tasks require bimanual dexterity, precise control, and touch feedback. The policies are trained using visuotactile data collected from the system, enabling the robot to perform long-horizon, high-precision tasks that are difficult to achieve without multifingered dexterity and touch feedback. The system also investigates the effects of dataset size, sensing modality, and visual input preprocessing on policy learning. The results show that vision and touch significantly enhance learning efficiency, policy success rate, and policy robustness. A dataset of a few hundred demonstrations is sufficient for learning effective bimanual dexterous policies.
The system demonstrates natural and human-like skills, showcasing unprecedented dexterity. The policies are trained using a diffusion policy approach, which allows for fast and smooth deployment. The system also includes an asynchronous inference algorithm, which enables parallel execution of the diffusion model prediction and robot execution. The policies are evaluated on four tasks, with success rates ranging from 50% to 100%. The system also shows that touch sensing is critical for the reliable completion of many challenging tasks. The system is open-sourced, with all hardware and software systems and the teleoperation dataset made publicly available for further research and collaboration.