AUTO RT: EMBODIED FOUNDATION MODELS FOR LARGE SCALE ORCHESTRATION OF ROBOTIC AGENTS

AUTO RT: EMBODIED FOUNDATION MODELS FOR LARGE SCALE ORCHESTRATION OF ROBOTIC AGENTS

2 Jul 2024 | Michael Ahn, Debidatta Dwibedi, Chelsea Finn, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Karol Hausman, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Sean Kirmani, Isabel Leal, Edward Lee, Sergey Levine, Yao Lu, Isabel Leal, Sharath Maddineni, Kanishka Rao, Dorsa Sadigh, Pannag Sanketi, Pierre Sermanet, Quan Vuong, Stefan Welker, Fei Xia, Ted Xiao, Peng Xu, Steve Xu, Zhuo Xu
AutoRT is a system designed to scale up the deployment of operational robots in unseen scenarios with minimal human supervision. It leverages vision-language models (VLMs) for scene understanding and grounding, and large language models (LLMs) for proposing diverse and novel instructions for a fleet of robots. The system guides data collection by tapping into the knowledge of foundation models, enabling effective reasoning about autonomy trade-offs and safety while significantly scaling up data collection for robot learning. AutoRT's core is an LLM acting as a robot orchestrator, prescribing tasks to robots based on user prompts and environmental affordances. The process involves scene description, task proposal, and affordance filtering to determine which tasks to attempt. The system also includes a robot constitution, which defines rules for safe and appropriate robot behavior, ensuring compliance with high-level objectives and constraints. The experimental evaluation demonstrates that AutoRT can propose instructions to over 20 robots across multiple buildings, collecting 77,000 real robot episodes via both teleoperation and autonomous policies. The collected data is shown to be more diverse and can be used to improve state-of-the-art robot learning models. The system also introduces aligning robot behavior to human preferences using prompting and critiquing with a robot constitution. AutoRT addresses challenges such as data diversity, task feasibility, and safety, making it a significant step towards scaling robot data collection and embodying foundation models into robotic systems. However, limitations include the reliance on scripted and learned policies, communication bandwidth issues, and the need for human supervision to ensure safety.AutoRT is a system designed to scale up the deployment of operational robots in unseen scenarios with minimal human supervision. It leverages vision-language models (VLMs) for scene understanding and grounding, and large language models (LLMs) for proposing diverse and novel instructions for a fleet of robots. The system guides data collection by tapping into the knowledge of foundation models, enabling effective reasoning about autonomy trade-offs and safety while significantly scaling up data collection for robot learning. AutoRT's core is an LLM acting as a robot orchestrator, prescribing tasks to robots based on user prompts and environmental affordances. The process involves scene description, task proposal, and affordance filtering to determine which tasks to attempt. The system also includes a robot constitution, which defines rules for safe and appropriate robot behavior, ensuring compliance with high-level objectives and constraints. The experimental evaluation demonstrates that AutoRT can propose instructions to over 20 robots across multiple buildings, collecting 77,000 real robot episodes via both teleoperation and autonomous policies. The collected data is shown to be more diverse and can be used to improve state-of-the-art robot learning models. The system also introduces aligning robot behavior to human preferences using prompting and critiquing with a robot constitution. AutoRT addresses challenges such as data diversity, task feasibility, and safety, making it a significant step towards scaling robot data collection and embodying foundation models into robotic systems. However, limitations include the reliance on scripted and learned policies, communication bandwidth issues, and the need for human supervision to ensure safety.
Reach us at info@study.space