Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation

Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation

28 Jun 2024 | Aaditya Prasad, Kevin Lin, Jimmy Wu, Linqi Zhou, Jeannette Bohg
Consistency Policy is a faster and equally powerful alternative to Diffusion Policy for learning visuomotor robot control. It is distilled from a pretrained Diffusion Policy by enforcing self-consistency along the Diffusion Policy's learned trajectories. Consistency Policy significantly speeds up inference by an order of magnitude compared to the fastest alternative method while maintaining competitive success rates across six simulation tasks and three real-world tasks. It is also robust to the quality of the pretrained Diffusion Policy, reducing the need for extensive testing of the pretrained model. Consistency Policy is based on the Consistency Trajectory Model (CTM) framework, which enforces self-consistency along the ODE trajectory by training the student model to predict the same output when given two distinct points along the same ODE trajectory. This approach is adapted for the robotics domain by replacing the diffusion frameworks used in Diffusion Policy with EDM, an analogous multi-step framework more commonly used for consistency distillation. The teacher model is trained using the EDM framework and then distilled using an adaptation of the CTM objective. Consistency Policy enables faster inference by generating action sequences in a single step, allowing for much faster inference speeds than Diffusion Policy while retaining competitive success rates. It also supports a 3-step inference process that trades off some inference speed for greater accuracy. The key design decisions that enabled this performance include the choice of consistency objective, reduced initial sample variance, and the choice of preset chaining steps. In real-world experiments, Consistency Policy was tested on three tasks: Trash Clean Up, Plug Insertion, and Microwave. It demonstrated strong performance on these tasks, with significantly lower inference times compared to the DDiM variant of Diffusion Policy. The results showed that Consistency Policy maintains its inference speed advantage in the Microwave task and performs slightly worse than DDiM. The Consistency Policy framework was also evaluated through ablation studies, which showed that the choice of consistency objective, initial sample variance, and preset chaining steps significantly impacted performance. The results indicated that the third consistency objective worked the best, while the CTM objective was far more computationally expensive to train than CTM-local and Consistency Distillation. Overall, Consistency Policy provides a faster and equally powerful alternative to Diffusion Policy for learning visuomotor robot control, with significant improvements in inference speed and robustness to the quality of the pretrained model.Consistency Policy is a faster and equally powerful alternative to Diffusion Policy for learning visuomotor robot control. It is distilled from a pretrained Diffusion Policy by enforcing self-consistency along the Diffusion Policy's learned trajectories. Consistency Policy significantly speeds up inference by an order of magnitude compared to the fastest alternative method while maintaining competitive success rates across six simulation tasks and three real-world tasks. It is also robust to the quality of the pretrained Diffusion Policy, reducing the need for extensive testing of the pretrained model. Consistency Policy is based on the Consistency Trajectory Model (CTM) framework, which enforces self-consistency along the ODE trajectory by training the student model to predict the same output when given two distinct points along the same ODE trajectory. This approach is adapted for the robotics domain by replacing the diffusion frameworks used in Diffusion Policy with EDM, an analogous multi-step framework more commonly used for consistency distillation. The teacher model is trained using the EDM framework and then distilled using an adaptation of the CTM objective. Consistency Policy enables faster inference by generating action sequences in a single step, allowing for much faster inference speeds than Diffusion Policy while retaining competitive success rates. It also supports a 3-step inference process that trades off some inference speed for greater accuracy. The key design decisions that enabled this performance include the choice of consistency objective, reduced initial sample variance, and the choice of preset chaining steps. In real-world experiments, Consistency Policy was tested on three tasks: Trash Clean Up, Plug Insertion, and Microwave. It demonstrated strong performance on these tasks, with significantly lower inference times compared to the DDiM variant of Diffusion Policy. The results showed that Consistency Policy maintains its inference speed advantage in the Microwave task and performs slightly worse than DDiM. The Consistency Policy framework was also evaluated through ablation studies, which showed that the choice of consistency objective, initial sample variance, and preset chaining steps significantly impacted performance. The results indicated that the third consistency objective worked the best, while the CTM objective was far more computationally expensive to train than CTM-local and Consistency Distillation. Overall, Consistency Policy provides a faster and equally powerful alternative to Diffusion Policy for learning visuomotor robot control, with significant improvements in inference speed and robustness to the quality of the pretrained model.
Reach us at info@study.space
Understanding Consistency Policy%3A Accelerated Visuomotor Policies via Consistency Distillation