8 Jul 2024 | Xuxin Cheng, Jialong Li, Shiqi Yang, Ge Yang, Xiaolong Wang
Open-TeleVision is an immersive teleoperation system that enables high-precision, long-horizon robotic tasks through stereoscopic visual feedback and active perception. The system allows operators to control robots remotely, with the robot's head and arms following the operator's movements in real-time, providing a first-person view of the environment. This system is designed for humanoid robots, such as Unitree H1 and Fourier GR-1, and supports tasks requiring fine-grained manipulation, including can sorting, can insertion, towel folding, and tube unloading. The system uses a stereo RGB camera on the robot's head to provide real-time, ego-centric 3D observations, which are streamed to VR devices for the operator. The system also includes a motion retargeting mechanism that translates the operator's hand and arm movements to the robot's joints, enabling precise control. The system is open-sourced and has been validated through experiments on four tasks, demonstrating its effectiveness in collecting data and training imitation learning policies. The system's ability to provide active visual feedback and real-time control has been shown to improve the success rate and efficiency of teleoperation tasks. The system also supports cross-country teleoperation, allowing operators in one location to control robots in another. The system's design emphasizes intuitive and responsive teleoperation, enabling complex tasks with multi-finger hands and dexterous manipulation. The system's performance has been evaluated through user studies, showing that stereo vision significantly improves task completion and success rates compared to monocular vision. The system's ability to handle long-horizon tasks and provide precise control has been demonstrated through experiments on various robotic tasks. The system's design and implementation have been validated through extensive testing and comparison with existing teleoperation systems, showing its effectiveness in enabling immersive and precise robotic manipulation.Open-TeleVision is an immersive teleoperation system that enables high-precision, long-horizon robotic tasks through stereoscopic visual feedback and active perception. The system allows operators to control robots remotely, with the robot's head and arms following the operator's movements in real-time, providing a first-person view of the environment. This system is designed for humanoid robots, such as Unitree H1 and Fourier GR-1, and supports tasks requiring fine-grained manipulation, including can sorting, can insertion, towel folding, and tube unloading. The system uses a stereo RGB camera on the robot's head to provide real-time, ego-centric 3D observations, which are streamed to VR devices for the operator. The system also includes a motion retargeting mechanism that translates the operator's hand and arm movements to the robot's joints, enabling precise control. The system is open-sourced and has been validated through experiments on four tasks, demonstrating its effectiveness in collecting data and training imitation learning policies. The system's ability to provide active visual feedback and real-time control has been shown to improve the success rate and efficiency of teleoperation tasks. The system also supports cross-country teleoperation, allowing operators in one location to control robots in another. The system's design emphasizes intuitive and responsive teleoperation, enabling complex tasks with multi-finger hands and dexterous manipulation. The system's performance has been evaluated through user studies, showing that stereo vision significantly improves task completion and success rates compared to monocular vision. The system's ability to handle long-horizon tasks and provide precise control has been demonstrated through experiments on various robotic tasks. The system's design and implementation have been validated through extensive testing and comparison with existing teleoperation systems, showing its effectiveness in enabling immersive and precise robotic manipulation.