16 May 2024 | Yunfan Jiang, Chen Wang, Ruohan Zhang, Jiajun Wu, Li Fei-Fei
TRANSIC is a novel, holistic human-in-the-loop method for sim-to-real policy transfer in complex and contact-rich manipulation tasks. The key challenge in sim-to-real transfer is addressing the simulation-to-reality (sim-to-real) gaps, which include perception, embodiment mismatch, controller inaccuracy, and dynamics realism. TRANSIC leverages human intervention and online correction to learn a residual policy that complements the base policy trained in simulation. This approach allows the robot to successfully complete tasks in the real world, even with limited real-world data. The method is evaluated on four benchmarked tasks (Stabilize, Reach and Grasp, Insert, and Screw) using a Franka Emika 3 robot. TRANSIC outperforms traditional sim-to-real methods and interactive imitation learning approaches, achieving higher success rates and requiring less real-world data. The method also demonstrates robustness to different types of sim-to-real gaps and scalability with human effort. Additionally, it exhibits intriguing properties such as generalization to unseen objects, effective gating, policy robustness, and the ability to solve long-horizon manipulation tasks.TRANSIC is a novel, holistic human-in-the-loop method for sim-to-real policy transfer in complex and contact-rich manipulation tasks. The key challenge in sim-to-real transfer is addressing the simulation-to-reality (sim-to-real) gaps, which include perception, embodiment mismatch, controller inaccuracy, and dynamics realism. TRANSIC leverages human intervention and online correction to learn a residual policy that complements the base policy trained in simulation. This approach allows the robot to successfully complete tasks in the real world, even with limited real-world data. The method is evaluated on four benchmarked tasks (Stabilize, Reach and Grasp, Insert, and Screw) using a Franka Emika 3 robot. TRANSIC outperforms traditional sim-to-real methods and interactive imitation learning approaches, achieving higher success rates and requiring less real-world data. The method also demonstrates robustness to different types of sim-to-real gaps and scalability with human effort. Additionally, it exhibits intriguing properties such as generalization to unseen objects, effective gating, policy robustness, and the ability to solve long-horizon manipulation tasks.