11 Aug 2023 | Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabir, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leaf, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Mallia, Deeksha Manjunath, Igor Mordatch, Ofir Nachum, Carolina Parada, Jodilyn Peralta, Emily Perez, Karl Persich, Jornell Quiambao, Kanishka Rao, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Kevin Sayed, Jaspiar Singh, Sumedh Sontakke, Austin Stone, Clayton Tan, Huong Tran, Vincent Vanhoucke, Steve Vega, Quan Vuong, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, Brianna Zitkovich
The paper introduces the Robotics Transformer (RT-1), a model designed to learn and perform a wide range of robotic tasks from large, diverse datasets. RT-1 is trained on over 130k robot demonstrations collected over 17 months using 13 robots in real-world kitchen environments. The model is designed to be efficient and capable of real-time control, with a 3Hz inference rate. Key contributions include:
1. **Model Architecture**: RT-1 combines a pre-trained EfficientNet, TokenLearner, and Transformer to process images, natural language instructions, and generate actions.
2. **Data Collection**: The dataset includes over 700 distinct task instructions, covering various objects and environments, ensuring broad task coverage and generalization.
3. **Performance and Generalization**: RT-1 achieves a 97% success rate on training tasks and generalizes well to new tasks, with improvements in robustness to distractors and backgrounds.
4. **Heterogeneous Data**: RT-1 can incorporate data from simulation and different robot types, improving performance on new scenarios without sacrificing original task performance.
5. **Long-Horizon Tasks**: RT-1 performs well in long-horizon tasks, executing sequences of skills with up to 50 steps.
The paper also discusses limitations, such as the model's reliance on imitation learning and its current inability to generalize to completely novel motions. Future work aims to expand the set of tasks and improve robustness to backgrounds and environments.The paper introduces the Robotics Transformer (RT-1), a model designed to learn and perform a wide range of robotic tasks from large, diverse datasets. RT-1 is trained on over 130k robot demonstrations collected over 17 months using 13 robots in real-world kitchen environments. The model is designed to be efficient and capable of real-time control, with a 3Hz inference rate. Key contributions include:
1. **Model Architecture**: RT-1 combines a pre-trained EfficientNet, TokenLearner, and Transformer to process images, natural language instructions, and generate actions.
2. **Data Collection**: The dataset includes over 700 distinct task instructions, covering various objects and environments, ensuring broad task coverage and generalization.
3. **Performance and Generalization**: RT-1 achieves a 97% success rate on training tasks and generalizes well to new tasks, with improvements in robustness to distractors and backgrounds.
4. **Heterogeneous Data**: RT-1 can incorporate data from simulation and different robot types, improving performance on new scenarios without sacrificing original task performance.
5. **Long-Horizon Tasks**: RT-1 performs well in long-horizon tasks, executing sequences of skills with up to 50 steps.
The paper also discusses limitations, such as the model's reliance on imitation learning and its current inability to generalize to completely novel motions. Future work aims to expand the set of tasks and improve robustness to backgrounds and environments.