Humanoid Locomotion as Next Token Prediction

Humanoid Locomotion as Next Token Prediction

29 Feb 2024 | Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, Jitendra Malik
This paper presents a method for real-world humanoid locomotion by treating it as a next token prediction problem, similar to language modeling. The approach involves training a causal transformer model to predict sensorimotor trajectories, enabling the robot to walk in San Francisco without prior training. The model is trained on a diverse dataset including trajectories from neural network policies, model-based controllers, motion capture data, and YouTube videos. The model can generalize to new commands, such as walking backward, and performs well even with limited training data. The model is trained to predict both sensory and motor tokens, allowing it to handle missing modalities by using mask tokens. The approach is validated through experiments showing that the model can walk on various surfaces and generalize to new environments. The model is also shown to scale well with larger datasets, longer context lengths, and larger model sizes. The results suggest that generative modeling of sensorimotor trajectories is a promising approach for learning real-world robot control tasks.This paper presents a method for real-world humanoid locomotion by treating it as a next token prediction problem, similar to language modeling. The approach involves training a causal transformer model to predict sensorimotor trajectories, enabling the robot to walk in San Francisco without prior training. The model is trained on a diverse dataset including trajectories from neural network policies, model-based controllers, motion capture data, and YouTube videos. The model can generalize to new commands, such as walking backward, and performs well even with limited training data. The model is trained to predict both sensory and motor tokens, allowing it to handle missing modalities by using mask tokens. The approach is validated through experiments showing that the model can walk on various surfaces and generalize to new environments. The model is also shown to scale well with larger datasets, longer context lengths, and larger model sizes. The results suggest that generative modeling of sensorimotor trajectories is a promising approach for learning real-world robot control tasks.
Reach us at info@study.space
[slides] Humanoid Locomotion as Next Token Prediction | StudySpace