28 Jun 2024 | Seungjae Lee, Yibin Wang, Haritheja Etukuru, H. Jin Kim, Nur Muhammad Mahi Shafullah, Lerrel Pinto
The paper introduces Vector-Quantized Behavior Transformer (VQ-BeT), a model designed to generate complex behaviors from labeled datasets. Unlike traditional methods that struggle with high-dimensional action spaces and long sequences, VQ-BeT uses hierarchical vector quantization to discretize continuous actions, improving the model's ability to capture and generate multimodal behaviors. The model is versatile, supporting both conditional and unconditional generation, and is evaluated across various simulated and real-world environments, including robotic manipulation, autonomous driving, and robotics. VQ-BeT outperforms state-of-the-art models such as Behavior Transformers (BeT) and Diffusion Policies in terms of performance and inference speed, demonstrating its effectiveness in handling multi-modal actions and long-range dependencies. The paper also discusses the design choices that impact VQ-BeT's performance and its potential applications in real-world robotics and autonomous driving.The paper introduces Vector-Quantized Behavior Transformer (VQ-BeT), a model designed to generate complex behaviors from labeled datasets. Unlike traditional methods that struggle with high-dimensional action spaces and long sequences, VQ-BeT uses hierarchical vector quantization to discretize continuous actions, improving the model's ability to capture and generate multimodal behaviors. The model is versatile, supporting both conditional and unconditional generation, and is evaluated across various simulated and real-world environments, including robotic manipulation, autonomous driving, and robotics. VQ-BeT outperforms state-of-the-art models such as Behavior Transformers (BeT) and Diffusion Policies in terms of performance and inference speed, demonstrating its effectiveness in handling multi-modal actions and long-range dependencies. The paper also discusses the design choices that impact VQ-BeT's performance and its potential applications in real-world robotics and autonomous driving.