11 Aug 2023 | Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabir, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leaf, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Mallia, Deeksha Manjunath, Igor Mordatch, Ofir Nachum, Carolina Parada, Jodilyn Peralta, Emily Perez, Karl Persich, Jornell Quiambao, Kanishka Rao, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Kevin Sayed, Jaspiar Singh, Sumedh Sontakke, Austin Stone, Clayton Tan, Huong Tran, Vincent Vanhoucke, Steve Vega, Quan Vuong, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, Brianna Zitkovich
RT-1 is a Robotics Transformer designed to enable real-world robotic control at scale. By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. The paper presents a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. The model is evaluated on a large-scale data collection on real robots performing real-world tasks, showing impressive generalization, robustness, and ability to learn from diverse data.
The main challenges in building such models in robotics are assembling the right dataset and designing the right model. While data collection and curation is often the "unsung hero" of many large-scale machine learning projects, this is especially true in robotics, where datasets are often robot-specific and gathered manually. The paper presents a dataset containing over 130k episodes and over 700 tasks, collected over 17 months with a fleet of 13 robots. The model, RT-1, is designed to efficiently process high-dimensional inputs and outputs, including camera images, instructions, and motor commands, into compact token representations to be used by the Transformer, allowing for efficient inference at runtime to make real-time control feasible.
The paper evaluates the performance of RT-1 on a variety of tasks, showing that it can perform over 700 training instructions at 97% success rate and can generalize to new tasks, distractors, and backgrounds 25%, 36%, and 18% better than the next best baseline, respectively. This level of performance allows the model to execute very long-horizon tasks in the SayCan framework, with as many as 50 stages. The paper also shows that RT-1 can incorporate data from simulation or even other robot types, retaining performance on the original tasks and improving generalization to new scenarios.
The paper also evaluates the generalization capabilities of RT-1 across different data quantities and data diversity, showing that data diversity has a higher impact on performance and generalization than data quantity. The paper concludes that RT-1 is a promising step towards large-scale robot learning with a data-absorbent model, but it comes with limitations, including the need for extensive data collection and the challenge of real-time control. Future work includes further exploration of the model's capabilities and the integration of more multi-robot datasets to enhance robot capabilities.RT-1 is a Robotics Transformer designed to enable real-world robotic control at scale. By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. The paper presents a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. The model is evaluated on a large-scale data collection on real robots performing real-world tasks, showing impressive generalization, robustness, and ability to learn from diverse data.
The main challenges in building such models in robotics are assembling the right dataset and designing the right model. While data collection and curation is often the "unsung hero" of many large-scale machine learning projects, this is especially true in robotics, where datasets are often robot-specific and gathered manually. The paper presents a dataset containing over 130k episodes and over 700 tasks, collected over 17 months with a fleet of 13 robots. The model, RT-1, is designed to efficiently process high-dimensional inputs and outputs, including camera images, instructions, and motor commands, into compact token representations to be used by the Transformer, allowing for efficient inference at runtime to make real-time control feasible.
The paper evaluates the performance of RT-1 on a variety of tasks, showing that it can perform over 700 training instructions at 97% success rate and can generalize to new tasks, distractors, and backgrounds 25%, 36%, and 18% better than the next best baseline, respectively. This level of performance allows the model to execute very long-horizon tasks in the SayCan framework, with as many as 50 stages. The paper also shows that RT-1 can incorporate data from simulation or even other robot types, retaining performance on the original tasks and improving generalization to new scenarios.
The paper also evaluates the generalization capabilities of RT-1 across different data quantities and data diversity, showing that data diversity has a higher impact on performance and generalization than data quantity. The paper concludes that RT-1 is a promising step towards large-scale robot learning with a data-absorbent model, but it comes with limitations, including the need for extensive data collection and the challenge of real-time control. Future work includes further exploration of the model's capabilities and the integration of more multi-robot datasets to enhance robot capabilities.