Understanding Behavior Generation with Latent Actions

This paper introduces VQ-BeT, a model for behavior generation that uses vector quantization to handle continuous, multi-modal action spaces. VQ-BeT improves upon previous models like Behavior Transformers (BeT) and Diffusion Policies by using a hierarchical vector quantization module to tokenize continuous actions, enabling it to handle both conditional and unconditional behavior generation. The model is trained on a variety of simulated and real-world environments, including robotic manipulation, autonomous driving, and real-world robotics. VQ-BeT outperforms existing methods in terms of performance and inference speed, achieving state-of-the-art results in multiple tasks. It also demonstrates improved ability to capture multi-modal behavior and handles long-horizon tasks more effectively. The model is efficient, with a 5x speedup over Diffusion Policies in simulation and 25x in real-world settings. VQ-BeT is versatile, capable of handling a wide range of tasks, and is designed to be scalable to large behavior datasets. The paper also discusses the design choices that impact VQ-BeT's performance, including the use of residual vector quantization and offset heads. The model is evaluated on a variety of tasks, including robotic manipulation, autonomous driving, and real-world robotics, showing its effectiveness in generating diverse and accurate behaviors. The results demonstrate that VQ-BeT is a promising approach for behavior generation in complex, multi-modal environments.This paper introduces VQ-BeT, a model for behavior generation that uses vector quantization to handle continuous, multi-modal action spaces. VQ-BeT improves upon previous models like Behavior Transformers (BeT) and Diffusion Policies by using a hierarchical vector quantization module to tokenize continuous actions, enabling it to handle both conditional and unconditional behavior generation. The model is trained on a variety of simulated and real-world environments, including robotic manipulation, autonomous driving, and real-world robotics. VQ-BeT outperforms existing methods in terms of performance and inference speed, achieving state-of-the-art results in multiple tasks. It also demonstrates improved ability to capture multi-modal behavior and handles long-horizon tasks more effectively. The model is efficient, with a 5x speedup over Diffusion Policies in simulation and 25x in real-world settings. VQ-BeT is versatile, capable of handling a wide range of tasks, and is designed to be scalable to large behavior datasets. The paper also discusses the design choices that impact VQ-BeT's performance, including the use of residual vector quantization and offset heads. The model is evaluated on a variety of tasks, including robotic manipulation, autonomous driving, and real-world robotics, showing its effectiveness in generating diverse and accurate behaviors. The results demonstrate that VQ-BeT is a promising approach for behavior generation in complex, multi-modal environments.

Behavior Generation with Latent Actions

2024 | Seungjae Lee, Yibin Wang, Haritheja Etukuru, H. Jin Kim, Nur Muhammad Mahi Shafulah, Lerrel Pinto