5 Jun 2024 | Xiaoyu Zhang Matthew Chang Pranav Kumar Saurabh Gupta
The paper "Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning" addresses the issue of compounding execution errors in imitation learning, where small mistakes lead to out-of-distribution states and cause the robot to fail. To mitigate this, the authors propose Diffusion Meets DAgger (DMD), a method that uses diffusion models to synthesize out-of-distribution samples and augment the expert demonstration dataset. This approach improves the sample efficiency of imitation learning without the need for manual data collection.
DMD is designed for eye-in-hand setups, where images are captured by a camera mounted on the robot hand. The method involves training a diffusion model to generate novel views relative to the expert demonstrations. These synthetic views are then used to create an augmented dataset that is combined with the original expert demonstrations to train the policy. The diffusion model is trained on task and play data, and it synthesizes perturbed views by conditioning on the reference image and transformation matrix.
The paper evaluates DMD across four tasks: pushing, stacking, pouring, and hanging a shirt. DMD achieves significant improvements over vanilla behavior cloning (BC) in all tasks. For example, DMD achieves an 80% success rate in pushing with only 8 expert demonstrations, compared to 20% for BC. In stacking, DMD succeeds on average 92% of the time across 5 cups, while BC succeeds only 40%. For pouring coffee beans, DMD transfers to another cup successfully 80% of the time, and for hanging a shirt, it attains a 90% success rate.
The authors also compare DMD to a NeRF-based augmentation method (SPARTN) and find that DMD outperforms it by 50%. They further demonstrate the effectiveness of DMD by testing it on a diverse in-the-wild dataset from the Universal Manipulation Interface (UMI) paper, showing that DMD improves performance even in the presence of a large number of diverse demonstrations and novel objects in novel environments.
Overall, DMD provides a robust and efficient solution for eye-in-hand imitation learning, improving the generalization and performance of policies trained with few expert demonstrations.The paper "Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning" addresses the issue of compounding execution errors in imitation learning, where small mistakes lead to out-of-distribution states and cause the robot to fail. To mitigate this, the authors propose Diffusion Meets DAgger (DMD), a method that uses diffusion models to synthesize out-of-distribution samples and augment the expert demonstration dataset. This approach improves the sample efficiency of imitation learning without the need for manual data collection.
DMD is designed for eye-in-hand setups, where images are captured by a camera mounted on the robot hand. The method involves training a diffusion model to generate novel views relative to the expert demonstrations. These synthetic views are then used to create an augmented dataset that is combined with the original expert demonstrations to train the policy. The diffusion model is trained on task and play data, and it synthesizes perturbed views by conditioning on the reference image and transformation matrix.
The paper evaluates DMD across four tasks: pushing, stacking, pouring, and hanging a shirt. DMD achieves significant improvements over vanilla behavior cloning (BC) in all tasks. For example, DMD achieves an 80% success rate in pushing with only 8 expert demonstrations, compared to 20% for BC. In stacking, DMD succeeds on average 92% of the time across 5 cups, while BC succeeds only 40%. For pouring coffee beans, DMD transfers to another cup successfully 80% of the time, and for hanging a shirt, it attains a 90% success rate.
The authors also compare DMD to a NeRF-based augmentation method (SPARTN) and find that DMD outperforms it by 50%. They further demonstrate the effectiveness of DMD by testing it on a diverse in-the-wild dataset from the Universal Manipulation Interface (UMI) paper, showing that DMD improves performance even in the presence of a large number of diverse demonstrations and novel objects in novel environments.
Overall, DMD provides a robust and efficient solution for eye-in-hand imitation learning, improving the generalization and performance of policies trained with few expert demonstrations.