[slides and audio] QueST%3A Self-Supervised Skill Abstractions for Learning Continuous Control

The paper "Quantized Skill Transformer (QueST): Self-Supervised Skill Abstractions for Learning Continuous Control" addresses the challenge of generalization in robot learning by proposing a novel architecture that learns temporal action abstractions using latent variable models (LVMs). The authors hypothesize that learning these abstractions can help in acquiring low-level skills that are transferable to new tasks. To achieve this, QueST employs a quantized autoencoder to learn a flexible and expressive latent space, which is then used to train a skill prior that can predict actions conditioned on task embeddings and observations. Key contributions of the paper include: 1. **Quantized Autoencoder**: QueST uses a quantized autoencoder to map action sequences to a discrete latent space, enabling the model to capture variable-length motion primitives. 2. **Causal Inductive Bias**: The encoder is designed with causal convolutions to encourage the model to learn semantically useful representations by modeling the inherent causality in action data. 3. **Skill Prior**: A skill prior is trained to autoregressively predict skill tokens, allowing the model to reason about the dependencies between these tokens. 4. **Performance on Benchmarks**: QueST outperforms state-of-the-art baselines in multitask imitation learning and few-shot learning benchmarks, demonstrating its effectiveness in learning and transferring low-level skills across different tasks. The paper also includes a detailed experimental evaluation on various benchmarks, such as LIBERO and MetaWorld, and provides ablation studies to validate the key design choices of QueST. The results highlight the superior performance of QueST in multitask and few-shot learning settings, making it a promising approach for learning generalizable low-level skills in robotics.The paper "Quantized Skill Transformer (QueST): Self-Supervised Skill Abstractions for Learning Continuous Control" addresses the challenge of generalization in robot learning by proposing a novel architecture that learns temporal action abstractions using latent variable models (LVMs). The authors hypothesize that learning these abstractions can help in acquiring low-level skills that are transferable to new tasks. To achieve this, QueST employs a quantized autoencoder to learn a flexible and expressive latent space, which is then used to train a skill prior that can predict actions conditioned on task embeddings and observations. Key contributions of the paper include: 1. **Quantized Autoencoder**: QueST uses a quantized autoencoder to map action sequences to a discrete latent space, enabling the model to capture variable-length motion primitives. 2. **Causal Inductive Bias**: The encoder is designed with causal convolutions to encourage the model to learn semantically useful representations by modeling the inherent causality in action data. 3. **Skill Prior**: A skill prior is trained to autoregressively predict skill tokens, allowing the model to reason about the dependencies between these tokens. 4. **Performance on Benchmarks**: QueST outperforms state-of-the-art baselines in multitask imitation learning and few-shot learning benchmarks, demonstrating its effectiveness in learning and transferring low-level skills across different tasks. The paper also includes a detailed experimental evaluation on various benchmarks, such as LIBERO and MetaWorld, and provides ablation studies to validate the key design choices of QueST. The results highlight the superior performance of QueST in multitask and few-shot learning settings, making it a promising approach for learning generalizable low-level skills in robotics.

QueST: Self-Supervised Skill Abstractions for Learning Continuous Control

23 Jul 2024 | Atharva Mete1, Haotian Xue1, Albert Wilcox1, Yongxin Chen1,2, Animesh Garg1,2