QueST: Self-Supervised Skill Abstractions for Learning Continuous Control

QueST: Self-Supervised Skill Abstractions for Learning Continuous Control

23 Jul 2024 | Atharva Mete¹, Haotian Xue¹, Albert Wilcox¹, Yongxin Chen¹², Animesh Garg¹²
QueST is a self-supervised skill abstraction method for learning continuous control. The paper proposes a novel architecture called Quantized Skill Transformer (QueST) that learns a larger and more flexible latent encoding to model a wide range of low-level skills. QueST introduces causal inductive bias from action sequence data into the latent space, leading to more semantically useful and transferable representations. The model is compared to state-of-the-art imitation learning and latent variable model (LVM) baselines and shows strong performance on multitask and few-shot learning benchmarks. QueST outperforms other baselines by 8% in multitask and 14% in few-shot imitation learning. The model is evaluated on several benchmarks, including LIBERO and MetaWorld, demonstrating its effectiveness in learning generalizable low-level skills. The paper also discusses limitations, including the need for larger datasets and the potential for future work in exploring other inductive biases. The results highlight the potential of QueST to leverage large multi-modal language models in stage-2 for improved performance.QueST is a self-supervised skill abstraction method for learning continuous control. The paper proposes a novel architecture called Quantized Skill Transformer (QueST) that learns a larger and more flexible latent encoding to model a wide range of low-level skills. QueST introduces causal inductive bias from action sequence data into the latent space, leading to more semantically useful and transferable representations. The model is compared to state-of-the-art imitation learning and latent variable model (LVM) baselines and shows strong performance on multitask and few-shot learning benchmarks. QueST outperforms other baselines by 8% in multitask and 14% in few-shot imitation learning. The model is evaluated on several benchmarks, including LIBERO and MetaWorld, demonstrating its effectiveness in learning generalizable low-level skills. The paper also discusses limitations, including the need for larger datasets and the potential for future work in exploring other inductive biases. The results highlight the potential of QueST to leverage large multi-modal language models in stage-2 for improved performance.
Reach us at info@study.space