1 Mar 2024 | Jun Wang, Yuzhe Qin, Kaiming Kuang, Yigit Korkmaz, Akhilan Gurumoorthy, Hao Su, Xiaolong Wang
CyberDemo is a novel approach to robotic imitation learning that leverages simulated human demonstrations for real-world tasks. By incorporating extensive data augmentation in a simulated environment, CyberDemo outperforms traditional in-domain real-world demonstrations when transferred to the real world, handling diverse physical and visual conditions. It achieves higher success rates across various tasks and generalizes to previously unseen objects. For example, it can rotate novel tetra-valve and penta-valve despite human demonstrations only involving tri-valves. CyberDemo uses a low-cost motion capture device for teleoperation and requires minimal human effort, achieving better performance on a real robot compared to pre-trained policies like R3M. It achieves a 35% higher success rate for quasi-static pick and place tasks and a 20% higher success rate for non-quasi-static rotate tasks. In generalization tests, it can rotate novel tetra-valve and penta-valve with a 42.5% success rate, even though human demonstrations only covered tri-valve. It also manages significant light disturbances. CyberDemo uses data augmentation and curriculum learning to train a policy on augmented data, then fine-tunes it with a few real-world demos. The system demonstrates the potential of simulated human demonstrations for real-world dexterous manipulation tasks. The code and dataset are publicly available for further research.CyberDemo is a novel approach to robotic imitation learning that leverages simulated human demonstrations for real-world tasks. By incorporating extensive data augmentation in a simulated environment, CyberDemo outperforms traditional in-domain real-world demonstrations when transferred to the real world, handling diverse physical and visual conditions. It achieves higher success rates across various tasks and generalizes to previously unseen objects. For example, it can rotate novel tetra-valve and penta-valve despite human demonstrations only involving tri-valves. CyberDemo uses a low-cost motion capture device for teleoperation and requires minimal human effort, achieving better performance on a real robot compared to pre-trained policies like R3M. It achieves a 35% higher success rate for quasi-static pick and place tasks and a 20% higher success rate for non-quasi-static rotate tasks. In generalization tests, it can rotate novel tetra-valve and penta-valve with a 42.5% success rate, even though human demonstrations only covered tri-valve. It also manages significant light disturbances. CyberDemo uses data augmentation and curriculum learning to train a policy on augmented data, then fine-tunes it with a few real-world demos. The system demonstrates the potential of simulated human demonstrations for real-world dexterous manipulation tasks. The code and dataset are publicly available for further research.