[slides] RT-1%3A Robotics Transformer for Real-World Control at Scale

The paper introduces the Robotics Transformer (RT-1), a model designed to learn and perform a wide range of robotic tasks from large, diverse datasets. RT-1 is trained on over 130k robot demonstrations collected over 17 months using 13 robots in real-world kitchen environments. The model is designed to be efficient and capable of real-time control, with a 3Hz inference rate. Key contributions include: 1. **Model Architecture**: RT-1 combines a pre-trained EfficientNet, TokenLearner, and Transformer to process images, natural language instructions, and generate actions. 2. **Data Collection**: The dataset includes over 700 distinct task instructions, covering various objects and environments, ensuring broad task coverage and generalization. 3. **Performance and Generalization**: RT-1 achieves a 97% success rate on training tasks and generalizes well to new tasks, with improvements in robustness to distractors and backgrounds. 4. **Heterogeneous Data**: RT-1 can incorporate data from simulation and different robot types, improving performance on new scenarios without sacrificing original task performance. 5. **Long-Horizon Tasks**: RT-1 performs well in long-horizon tasks, executing sequences of skills with up to 50 steps. The paper also discusses limitations, such as the model's reliance on imitation learning and its current inability to generalize to completely novel motions. Future work aims to expand the set of tasks and improve robustness to backgrounds and environments.The paper introduces the Robotics Transformer (RT-1), a model designed to learn and perform a wide range of robotic tasks from large, diverse datasets. RT-1 is trained on over 130k robot demonstrations collected over 17 months using 13 robots in real-world kitchen environments. The model is designed to be efficient and capable of real-time control, with a 3Hz inference rate. Key contributions include: 1. **Model Architecture**: RT-1 combines a pre-trained EfficientNet, TokenLearner, and Transformer to process images, natural language instructions, and generate actions. 2. **Data Collection**: The dataset includes over 700 distinct task instructions, covering various objects and environments, ensuring broad task coverage and generalization. 3. **Performance and Generalization**: RT-1 achieves a 97% success rate on training tasks and generalizes well to new tasks, with improvements in robustness to distractors and backgrounds. 4. **Heterogeneous Data**: RT-1 can incorporate data from simulation and different robot types, improving performance on new scenarios without sacrificing original task performance. 5. **Long-Horizon Tasks**: RT-1 performs well in long-horizon tasks, executing sequences of skills with up to 50 steps. The paper also discusses limitations, such as the model's reliance on imitation learning and its current inability to generalize to completely novel motions. Future work aims to expand the set of tasks and improve robustness to backgrounds and environments.

RT-1: Robotics Transformer for Real-World Control at Scale