21 Apr 2024 | Tianwei Ni*, Benjamin Eysenbach*, Erfan Seyedsalehi*, Michel Ma*, Clement Gehring*, Aditya Mahajan*, Pierre-Luc Bacon*
This paper explores the relationships between various state and history representation learning methods in reinforcement learning (RL), particularly focusing on their common underlying idea of self-predictive abstraction. The authors provide theoretical insights into the optimization of widely used objectives, such as the stop-gradient technique, and propose a minimalist algorithm for learning self-predictive representations. The algorithm integrates an auxiliary task into any model-free RL algorithm, enabling the end-to-end learning of self-predictive representations without the need for reward model learning, planning, multi-step predictions, or metric learning. Extensive experiments on standard MDPs, MDPs with distractors, and POMDPs with sparse rewards validate the effectiveness of the proposed algorithm and support the theoretical findings. The paper concludes with preliminary guidelines for RL practitioners, emphasizing the importance of analyzing task characteristics and using the minimalist algorithm as a baseline for independent evaluation of representation learning and policy optimization effects.This paper explores the relationships between various state and history representation learning methods in reinforcement learning (RL), particularly focusing on their common underlying idea of self-predictive abstraction. The authors provide theoretical insights into the optimization of widely used objectives, such as the stop-gradient technique, and propose a minimalist algorithm for learning self-predictive representations. The algorithm integrates an auxiliary task into any model-free RL algorithm, enabling the end-to-end learning of self-predictive representations without the need for reward model learning, planning, multi-step predictions, or metric learning. Extensive experiments on standard MDPs, MDPs with distractors, and POMDPs with sparse rewards validate the effectiveness of the proposed algorithm and support the theoretical findings. The paper concludes with preliminary guidelines for RL practitioners, emphasizing the importance of analyzing task characteristics and using the minimalist algorithm as a baseline for independent evaluation of representation learning and policy optimization effects.