21 Sep 2020 | Aravind Srinivas*, Michael Laskin*, Pieter Abbeel
CURL is a contrastive unsupervised representation learning method for reinforcement learning (RL) that improves sample efficiency by learning high-level features from raw pixels. It combines contrastive learning with RL to extract semantic representations from images, enabling more efficient control. CURL outperforms prior pixel-based methods on the DeepMind Control Suite and Atari Games, achieving 1.9x and 1.2x performance gains at 100K environment and interaction steps benchmarks, respectively. It is the first image-based algorithm to nearly match the sample efficiency of state-based methods. CURL is implemented with a simple architecture that integrates contrastive learning with model-free RL, using a momentum-averaged encoder for key observations and a query encoder for RL. It is compatible with various RL algorithms, including SAC and Rainbow DQN. CURL's contrastive learning objective uses a bi-linear inner product and is trained jointly with the RL algorithm. It achieves state-of-the-art performance on multiple environments, surpassing human performance on two games. CURL is open-sourced and available at https://www.github.com/MishaLaskin/curl.CURL is a contrastive unsupervised representation learning method for reinforcement learning (RL) that improves sample efficiency by learning high-level features from raw pixels. It combines contrastive learning with RL to extract semantic representations from images, enabling more efficient control. CURL outperforms prior pixel-based methods on the DeepMind Control Suite and Atari Games, achieving 1.9x and 1.2x performance gains at 100K environment and interaction steps benchmarks, respectively. It is the first image-based algorithm to nearly match the sample efficiency of state-based methods. CURL is implemented with a simple architecture that integrates contrastive learning with model-free RL, using a momentum-averaged encoder for key observations and a query encoder for RL. It is compatible with various RL algorithms, including SAC and Rainbow DQN. CURL's contrastive learning objective uses a bi-linear inner product and is trained jointly with the RL algorithm. It achieves state-of-the-art performance on multiple environments, surpassing human performance on two games. CURL is open-sourced and available at https://www.github.com/MishaLaskin/curl.