ScrewMimic: Bimanual Imitation from Human Videos with Screw Space Projection

ScrewMimic: Bimanual Imitation from Human Videos with Screw Space Projection

6 May 2024 | Arpit Bahety, Priyanka Mandikal, Ben Abbatematteo, Roberto Martín-Martín
SCREWMIMIC is a novel framework for learning bimanual manipulation from human video demonstrations. It models bimanual interaction as a screw motion, defining a new action space called screw actions. This approach simplifies complex bimanual tasks by projecting human motion into a 1-DoF screw joint, enabling efficient learning and fine-tuning. SCREWMIMIC uses a single human demonstration to generate a screw action, which is then refined through self-supervised policy fine-tuning. The framework includes a perceptual module to interpret human demonstrations, a prediction model to generate screw actions from object point clouds, and a self-supervised fine-tuning algorithm to improve performance. Experiments show that SCREWMIMIC outperforms baselines in learning bimanual tasks, achieving high success rates across six real-world tasks. The framework demonstrates robustness to noisy demonstrations and generalizes to new object poses. SCREWMIMIC's screw action representation enables efficient exploration and successful bimanual manipulation, with the ability to iteratively refine policies through self-supervised learning. The work highlights the potential of screw space projection for bimanual manipulation, offering a scalable and effective solution for robot learning from human demonstrations.SCREWMIMIC is a novel framework for learning bimanual manipulation from human video demonstrations. It models bimanual interaction as a screw motion, defining a new action space called screw actions. This approach simplifies complex bimanual tasks by projecting human motion into a 1-DoF screw joint, enabling efficient learning and fine-tuning. SCREWMIMIC uses a single human demonstration to generate a screw action, which is then refined through self-supervised policy fine-tuning. The framework includes a perceptual module to interpret human demonstrations, a prediction model to generate screw actions from object point clouds, and a self-supervised fine-tuning algorithm to improve performance. Experiments show that SCREWMIMIC outperforms baselines in learning bimanual tasks, achieving high success rates across six real-world tasks. The framework demonstrates robustness to noisy demonstrations and generalizes to new object poses. SCREWMIMIC's screw action representation enables efficient exploration and successful bimanual manipulation, with the ability to iteratively refine policies through self-supervised learning. The work highlights the potential of screw space projection for bimanual manipulation, offering a scalable and effective solution for robot learning from human demonstrations.
Reach us at info@study.space