31 May 2024 | Zoey Chen, Aaron Walsman, Marius Memmel, Kaichun Mo, Alex Fang, Karthikeya Vemuri, Alan Wu, Dieter Fox*, Abhishek Gupta*
URDFomer is a pipeline designed to construct realistic and diverse simulation environments from real-world images, enabling zero-shot real-to-sim-to-real transfer. The pipeline addresses the challenge of generating large-scale, physically and visually realistic simulation scenes, which are crucial for robotics and computer vision applications. URDFomer leverages controllable text-to-image generative models to create paired datasets of simulation scenes and corresponding realistic images. These datasets are then used to train an inverse model, URDFomer, which can predict the kinematic and dynamic structure of a scene from a single RGB image. The generated scenes are used to train robotic control policies, demonstrating robust deployment in real-world tasks such as articulated object manipulation. The pipeline is flexible and can be applied to different robots and tasks, showcasing its versatility and effectiveness in real-world robotic learning.URDFomer is a pipeline designed to construct realistic and diverse simulation environments from real-world images, enabling zero-shot real-to-sim-to-real transfer. The pipeline addresses the challenge of generating large-scale, physically and visually realistic simulation scenes, which are crucial for robotics and computer vision applications. URDFomer leverages controllable text-to-image generative models to create paired datasets of simulation scenes and corresponding realistic images. These datasets are then used to train an inverse model, URDFomer, which can predict the kinematic and dynamic structure of a scene from a single RGB image. The generated scenes are used to train robotic control policies, demonstrating robust deployment in real-world tasks such as articulated object manipulation. The pipeline is flexible and can be applied to different robots and tasks, showcasing its versatility and effectiveness in real-world robotic learning.