[slides] EquiBot%3A SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning

**EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning** **Authors:** Jingyun Yang, Zi-ang Cao, Congyue Deng, Rika Antonova, Shuran Song, Jeannette Bohg **Institution:** Stanford University **Abstract:** This paper introduces EquiBot, a robust, data-efficient, and generalizable approach for robot manipulation task learning. EquiBot combines SIM(3)-equivariant neural network architectures with diffusion models to ensure that learned policies are invariant to changes in scale, rotation, and translation. This enhances their applicability to unseen environments while retaining the benefits of diffusion-based policy learning, such as multi-modality and robustness. The method is evaluated in both simulation and real-world experiments, demonstrating improved data efficiency and generalization to novel scenarios. **Key Contributions:** 1. **Equivariance in Diffusion Models:** EquiBot incorporates equivariance into the diffusion process, ensuring that the learned policies are invariant to transformations like scale, rotation, and translation. 2. **Data Efficiency:** The method reduces the data requirements and improves generalization to novel scenarios, even with limited human demonstrations. 3. **Real-World Performance:** EquiBot successfully generalizes to unseen objects and scenes in real-world manipulation tasks, outperforming competing baselines. **Methods:** - **SIM(3)-Equivariant Neural Networks:** The network architecture is designed to be equivariant to rigid 3D transformations, ensuring that the learned policies scale, translate, and rotate with the inputs. - **Diffusion Process:** The diffusion process is used to model the conditional distribution of actions given observations, with each diffusion step being equivariant by construction. **Experiments:** - **Simulation:** EquiBot is evaluated on six simulation tasks, showing improved out-of-distribution generalization and data efficiency compared to vanilla diffusion policies and other equivariant policy architectures. - **Real World:** EquiBot is tested on six real-world mobile manipulation tasks, demonstrating successful performance with unseen objects and scenes after learning from just 5 minutes of human demonstrations. **Conclusion:** EquiBot is a novel approach for visuomotor policy learning that achieves generalizable and data-efficient learning in a wide range of robot manipulation tasks. It outperforms vanilla diffusion policies and prior methods using equivariant architectures, demonstrating strong generalization to unseen scenarios.**EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning** **Authors:** Jingyun Yang, Zi-ang Cao, Congyue Deng, Rika Antonova, Shuran Song, Jeannette Bohg **Institution:** Stanford University **Abstract:** This paper introduces EquiBot, a robust, data-efficient, and generalizable approach for robot manipulation task learning. EquiBot combines SIM(3)-equivariant neural network architectures with diffusion models to ensure that learned policies are invariant to changes in scale, rotation, and translation. This enhances their applicability to unseen environments while retaining the benefits of diffusion-based policy learning, such as multi-modality and robustness. The method is evaluated in both simulation and real-world experiments, demonstrating improved data efficiency and generalization to novel scenarios. **Key Contributions:** 1. **Equivariance in Diffusion Models:** EquiBot incorporates equivariance into the diffusion process, ensuring that the learned policies are invariant to transformations like scale, rotation, and translation. 2. **Data Efficiency:** The method reduces the data requirements and improves generalization to novel scenarios, even with limited human demonstrations. 3. **Real-World Performance:** EquiBot successfully generalizes to unseen objects and scenes in real-world manipulation tasks, outperforming competing baselines. **Methods:** - **SIM(3)-Equivariant Neural Networks:** The network architecture is designed to be equivariant to rigid 3D transformations, ensuring that the learned policies scale, translate, and rotate with the inputs. - **Diffusion Process:** The diffusion process is used to model the conditional distribution of actions given observations, with each diffusion step being equivariant by construction. **Experiments:** - **Simulation:** EquiBot is evaluated on six simulation tasks, showing improved out-of-distribution generalization and data efficiency compared to vanilla diffusion policies and other equivariant policy architectures. - **Real World:** EquiBot is tested on six real-world mobile manipulation tasks, demonstrating successful performance with unseen objects and scenes after learning from just 5 minutes of human demonstrations. **Conclusion:** EquiBot is a novel approach for visuomotor policy learning that achieves generalizable and data-efficient learning in a wide range of robot manipulation tasks. It outperforms vanilla diffusion policies and prior methods using equivariant architectures, demonstrating strong generalization to unseen scenarios.

EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning

1 Jul 2024 | Jingyun Yang, Zi-ang Cao*, Congyue Deng, Rika Antonova, Shuran Song, Jeannette Bohg