3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

8 Jun 2024 | Yanjie Ze, Gu Zhang, Kangning Zhang, Chenyuan Hu, Muhan Wang, Huazhe Xu
3D Diffusion Policy (DP3) is a novel visual imitation learning algorithm that integrates 3D visual representations with diffusion policies, achieving effective performance in diverse simulation and real-world tasks. DP3 uses a compact 3D representation derived from sparse point clouds with an efficient point encoder. In 72 simulation tasks, DP3 handles most tasks with just 10 demonstrations, outperforming baselines by 24.2%. In 4 real robot tasks, DP3 achieves a 85% success rate with only 40 demonstrations and demonstrates strong generalization across space, viewpoint, appearance, and instance. DP3 also shows safer deployment in real-world scenarios, rarely violating safety requirements, unlike baseline methods that often do. DP3's 3D representations are crucial for real-world robot learning. The algorithm uses a lightweight MLP encoder to process point clouds into compact 3D features, and a diffusion-based backbone to generate action sequences conditioned on these features. DP3 is efficient, effective, and generalizable, with faster inference speeds than 2D-based diffusion policies. It performs well in both high-dimensional and low-dimensional control tasks, and is capable of handling complex tasks with minimal human data. DP3's success is attributed to its 3D representations and careful design. The algorithm is evaluated on a diverse set of simulation and real-world tasks, demonstrating its universality and effectiveness. DP3's 3D representations are preferred over other 3D representations and are better suited for diffusion policies. The code and videos are available on the project's GitHub page.3D Diffusion Policy (DP3) is a novel visual imitation learning algorithm that integrates 3D visual representations with diffusion policies, achieving effective performance in diverse simulation and real-world tasks. DP3 uses a compact 3D representation derived from sparse point clouds with an efficient point encoder. In 72 simulation tasks, DP3 handles most tasks with just 10 demonstrations, outperforming baselines by 24.2%. In 4 real robot tasks, DP3 achieves a 85% success rate with only 40 demonstrations and demonstrates strong generalization across space, viewpoint, appearance, and instance. DP3 also shows safer deployment in real-world scenarios, rarely violating safety requirements, unlike baseline methods that often do. DP3's 3D representations are crucial for real-world robot learning. The algorithm uses a lightweight MLP encoder to process point clouds into compact 3D features, and a diffusion-based backbone to generate action sequences conditioned on these features. DP3 is efficient, effective, and generalizable, with faster inference speeds than 2D-based diffusion policies. It performs well in both high-dimensional and low-dimensional control tasks, and is capable of handling complex tasks with minimal human data. DP3's success is attributed to its 3D representations and careful design. The algorithm is evaluated on a diverse set of simulation and real-world tasks, demonstrating its universality and effectiveness. DP3's 3D representations are preferred over other 3D representations and are better suited for diffusion policies. The code and videos are available on the project's GitHub page.
Reach us at info@study.space