Understanding Render and Diffuse%3A Aligning Image and Action Spaces for Diffusion-based Behaviour Cloning

The paper introduces Render and Diffuse (R&D), a method that unifies low-level robot actions and RGB observations within the image space using virtual renders of the robot's 3D model. This approach simplifies the learning problem and introduces inductive biases, enhancing sample efficiency and spatial generalization. R&D uses a learned diffusion process to iteratively update the virtual renders of the robot, aligning them with the actions observed in training data. The method is evaluated in simulation and real-world tasks, demonstrating strong spatial generalization capabilities and improved sample efficiency compared to existing image-to-action methods. The contributions of the paper include a novel way to combine low-level actions and RGB observations, a family of R&D methods, and thorough evaluations in various settings.The paper introduces Render and Diffuse (R&D), a method that unifies low-level robot actions and RGB observations within the image space using virtual renders of the robot's 3D model. This approach simplifies the learning problem and introduces inductive biases, enhancing sample efficiency and spatial generalization. R&D uses a learned diffusion process to iteratively update the virtual renders of the robot, aligning them with the actions observed in training data. The method is evaluated in simulation and real-world tasks, demonstrating strong spatial generalization capabilities and improved sample efficiency compared to existing image-to-action methods. The contributions of the paper include a novel way to combine low-level actions and RGB observations, a family of R&D methods, and thorough evaluations in various settings.

Render and Diffuse: Aligning Image and Action Spaces for Diffusion-based Behaviour Cloning

28 May 2024 | Vitalis Vosylius, Younggyo Seo, Jafar Uruç, Stephen James