22 May 2024 | Toru Lin, Yu Zhang*, Qiyang Li*, Haozhi Qi*, Brent Yi, Sergey Levine, and Jitendra Malik
This article introduces a system for learning visuotactile skills using two multifingered hands, focusing on bimanual manipulation tasks. The system, called HATO (Hands-Arms Teleoperation), is a low-cost teleoperation setup that combines off-the-shelf hardware and a comprehensive software suite for data collection, multimodal processing, and policy learning. The hardware includes two UR5 robot arms equipped with Psyonic Ability Hands, which are modified prosthetic hands with tactile sensors. The system uses RGB-D cameras for visual input and tactile sensors for touch feedback. The goal is to enable robots to perform complex, high-precision tasks that require human-like dexterity and sensory feedback.
The article describes four tasks—slippery handover, block stacking, wine pouring, and steak serving—that demonstrate the system's capabilities. The policies are learned from visuotactile data collected through HATO, showing that touch and vision significantly improve learning efficiency and task success. The study also investigates the impact of dataset size, sensing modalities, and visual preprocessing on policy performance. Results indicate that even a small dataset of a few hundred demonstrations can lead to effective policies. The research emphasizes the importance of tactile sensing for reliable dexterity and highlights the potential of HATO for future research in bimanual robotic manipulation.This article introduces a system for learning visuotactile skills using two multifingered hands, focusing on bimanual manipulation tasks. The system, called HATO (Hands-Arms Teleoperation), is a low-cost teleoperation setup that combines off-the-shelf hardware and a comprehensive software suite for data collection, multimodal processing, and policy learning. The hardware includes two UR5 robot arms equipped with Psyonic Ability Hands, which are modified prosthetic hands with tactile sensors. The system uses RGB-D cameras for visual input and tactile sensors for touch feedback. The goal is to enable robots to perform complex, high-precision tasks that require human-like dexterity and sensory feedback.
The article describes four tasks—slippery handover, block stacking, wine pouring, and steak serving—that demonstrate the system's capabilities. The policies are learned from visuotactile data collected through HATO, showing that touch and vision significantly improve learning efficiency and task success. The study also investigates the impact of dataset size, sensing modalities, and visual preprocessing on policy performance. Results indicate that even a small dataset of a few hundred demonstrations can lead to effective policies. The research emphasizes the importance of tactile sensing for reliable dexterity and highlights the potential of HATO for future research in bimanual robotic manipulation.