Octo: An Open-Source Generalist Robot Policy

Octo: An Open-Source Generalist Robot Policy

26 May 2024 | Dibya Ghosh*,1 Homer Walke*,1 Karl Pertsch*,1,2 Kevin Black*,1 Oier Mees*,1 Sudeep Dasari3 Joey Hejna2 Tobias Kreiman1 Ria Doshi1 Charles Xu1 Jianlan Luo1 You Liang Tan1 Lawrence Yunliang Chen1 Pannag Sanketi4 Quan Vuong4 Ted Xiao4 Dorsa Sadigh2 Chelsea Finn2 Sergey Levine1
Octo is an open-source, generalist policy for robotic manipulation, designed to be flexible and scalable. It is a transformer-based policy pre-trained on 800k diverse robot episodes from the Open X-Embodiment dataset, the largest robot manipulation dataset to date. Octo supports flexible task and observation definitions and can be quickly finetuned to new observation and action spaces. The model architecture includes tokenizers for task descriptions and input observations, a transformer backbone, and readout heads for producing actions. Octo's design allows for efficient finetuning to new robot setups, including different action spaces and sensor configurations. The training process involves a conditional diffusion decoding head to predict continuous, multi-modal action distributions. Experiments across 9 robotic platforms demonstrate Octo's versatility as a policy initialization, showing that it can be effectively finetuned to new observation and action spaces. The paper also includes detailed ablations to guide future research on building generalist robot models.Octo is an open-source, generalist policy for robotic manipulation, designed to be flexible and scalable. It is a transformer-based policy pre-trained on 800k diverse robot episodes from the Open X-Embodiment dataset, the largest robot manipulation dataset to date. Octo supports flexible task and observation definitions and can be quickly finetuned to new observation and action spaces. The model architecture includes tokenizers for task descriptions and input observations, a transformer backbone, and readout heads for producing actions. Octo's design allows for efficient finetuning to new robot setups, including different action spaces and sensor configurations. The training process involves a conditional diffusion decoding head to predict continuous, multi-modal action distributions. Experiments across 9 robotic platforms demonstrate Octo's versatility as a policy initialization, showing that it can be effectively finetuned to new observation and action spaces. The paper also includes detailed ablations to guide future research on building generalist robot models.
Reach us at info@study.space