Foundation Policies with Hilbert Representations

Foundation Policies with Hilbert Representations

2024 | Seohong Park, Tobias Kreiman, Sergey Levine
This paper introduces a novel unsupervised framework for pre-training generalist policies that can capture diverse, optimal, long-horizon behaviors from unlabeled offline data. The key insight is to learn a structured representation that preserves the temporal structure of the underlying environment and then span this learned latent space with directional movements. This enables various zero-shot policy "prompting" schemes for downstream tasks. Through experiments on simulated robotic locomotion and manipulation benchmarks, the authors demonstrate that their unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot manner, often outperforming prior methods designed specifically for each setting. The main contributions include the introduction of Hilbert foundation policies (HILPs) and their evaluation on zero-shot RL, offline goal-conditioned RL, and hierarchical RL tasks. The code and videos are available at <https://seohong.me/projects/hilp/>.This paper introduces a novel unsupervised framework for pre-training generalist policies that can capture diverse, optimal, long-horizon behaviors from unlabeled offline data. The key insight is to learn a structured representation that preserves the temporal structure of the underlying environment and then span this learned latent space with directional movements. This enables various zero-shot policy "prompting" schemes for downstream tasks. Through experiments on simulated robotic locomotion and manipulation benchmarks, the authors demonstrate that their unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot manner, often outperforming prior methods designed specifically for each setting. The main contributions include the introduction of Hilbert foundation policies (HILPs) and their evaluation on zero-shot RL, offline goal-conditioned RL, and hierarchical RL tasks. The code and videos are available at <https://seohong.me/projects/hilp/>.
Reach us at info@study.space
[slides and audio] Foundation Policies with Hilbert Representations