Understanding ManiCM%3A Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation

ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation **Abstract:** Diffusion models have shown effectiveness in generating complex distributions, including natural images and motion trajectories. However, they suffer from runtime inefficiency due to multiple denoising steps, especially with high-dimensional observations. To address this, we propose ManiCM, a real-time robotic manipulation model that imposes consistency constraints on the diffusion process, enabling one-step inference. Specifically, we formulate a consistent diffusion process in the robot action space conditioned on point cloud inputs, ensuring the original action can be directly denoised from any point along the ODE trajectory. We design a consistency distillation technique to predict the action sample directly, achieving faster convergence in the low-dimensional action manifold. Evaluations on 31 robotic manipulation tasks from Adroit and Metaworld demonstrate that ManiCM accelerates the state-of-the-art method by 10 times in average inference speed while maintaining competitive success rates. **Introduction:** Designing robots for diverse manipulation tasks has been a long-standing challenge. Previous approaches have explored various architectures, including convolutional networks, transformers, and generative models like diffusion models. Diffusion-based policies are particularly effective in modeling high-dimensional robotic trajectories, but they suffer from runtime inefficiency due to iterative sampling. To address this, recent efforts focus on hierarchical sampling to speed up inference, but determining the hierarchy across different domains remains challenging. Consistency models, introduced in image generation, enforce efficient sampling by mapping any point on the ODE trajectory to the clean image. We leverage this concept to propose ManiCM, a real-time 3D diffusion policy that generates robot actions in one step. **Method:** ManiCM uses a manipulation consistency model to handle low decision efficiency in 3D manipulation diffusion models. We choose DP3 as the underlying diffusion model and distill its knowledge into a single-step sampler. The model incorporates 3D point cloud conditions and proposes a manipulation self-consistency function to predict the action sample directly, leading to faster convergence. The consistency distillation loss ensures the denoised outputs of the online and target networks align, achieving self-consistency. **Experiments:** We evaluate ManiCM on 31 tasks from Adroit and Metaworld, demonstrating an average inference speed acceleration of 10 times compared to state-of-the-art methods while maintaining competitive success rates. Ablation studies and qualitative comparisons further validate the effectiveness and efficiency of ManiCM. **Conclusion:** ManiCM introduces a real-time 3D diffusion policy that leverages consistency models to accelerate robotic manipulation tasks. Extensive experiments confirm its effectiveness and efficiency, making it a promising approach for real-time closed-loop control in robotics.ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation **Abstract:** Diffusion models have shown effectiveness in generating complex distributions, including natural images and motion trajectories. However, they suffer from runtime inefficiency due to multiple denoising steps, especially with high-dimensional observations. To address this, we propose ManiCM, a real-time robotic manipulation model that imposes consistency constraints on the diffusion process, enabling one-step inference. Specifically, we formulate a consistent diffusion process in the robot action space conditioned on point cloud inputs, ensuring the original action can be directly denoised from any point along the ODE trajectory. We design a consistency distillation technique to predict the action sample directly, achieving faster convergence in the low-dimensional action manifold. Evaluations on 31 robotic manipulation tasks from Adroit and Metaworld demonstrate that ManiCM accelerates the state-of-the-art method by 10 times in average inference speed while maintaining competitive success rates. **Introduction:** Designing robots for diverse manipulation tasks has been a long-standing challenge. Previous approaches have explored various architectures, including convolutional networks, transformers, and generative models like diffusion models. Diffusion-based policies are particularly effective in modeling high-dimensional robotic trajectories, but they suffer from runtime inefficiency due to iterative sampling. To address this, recent efforts focus on hierarchical sampling to speed up inference, but determining the hierarchy across different domains remains challenging. Consistency models, introduced in image generation, enforce efficient sampling by mapping any point on the ODE trajectory to the clean image. We leverage this concept to propose ManiCM, a real-time 3D diffusion policy that generates robot actions in one step. **Method:** ManiCM uses a manipulation consistency model to handle low decision efficiency in 3D manipulation diffusion models. We choose DP3 as the underlying diffusion model and distill its knowledge into a single-step sampler. The model incorporates 3D point cloud conditions and proposes a manipulation self-consistency function to predict the action sample directly, leading to faster convergence. The consistency distillation loss ensures the denoised outputs of the online and target networks align, achieving self-consistency. **Experiments:** We evaluate ManiCM on 31 tasks from Adroit and Metaworld, demonstrating an average inference speed acceleration of 10 times compared to state-of-the-art methods while maintaining competitive success rates. Ablation studies and qualitative comparisons further validate the effectiveness and efficiency of ManiCM. **Conclusion:** ManiCM introduces a real-time 3D diffusion policy that leverages consistency models to accelerate robotic manipulation tasks. Extensive experiments confirm its effectiveness and efficiency, making it a promising approach for real-time closed-loop control in robotics.

ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation

3 Jun 2024 | Guanxing Lu, Zifeng Gao, Tianxing Chen, Wenxun Dai, Ziwei Wang, Yansong Tang