Bridging State and History Representations: Understanding Self-Predictive RL

Bridging State and History Representations: Understanding Self-Predictive RL

2024 | Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring, Aditya Mahajan, Pierre-Luc Bacon
This paper presents a unified view of state and history representations in reinforcement learning (RL), showing that many seemingly distinct methods are based on a common idea of self-predictive abstraction. The authors analyze the relationships between various representations, including self-predictive, observation-predictive, and Q*-irrelevance abstractions, and provide theoretical insights into their properties and learning objectives. They demonstrate that self-predictive representations can be learned through a minimalist algorithm that uses a single auxiliary loss, without the need for reward modeling or planning. The algorithm is validated on standard MDPs, MDPs with distractors, and POMDPs with sparse rewards, showing improved sample efficiency and robustness compared to existing methods. The paper also highlights the importance of using stop-gradient techniques in learning self-predictive representations, which helps avoid representational collapse. The findings provide a set of guidelines for RL practitioners, emphasizing the role of representation learning in improving policy optimization. The work bridges the gap between different approaches in RL and offers a principled analysis of state and history representation learning.This paper presents a unified view of state and history representations in reinforcement learning (RL), showing that many seemingly distinct methods are based on a common idea of self-predictive abstraction. The authors analyze the relationships between various representations, including self-predictive, observation-predictive, and Q*-irrelevance abstractions, and provide theoretical insights into their properties and learning objectives. They demonstrate that self-predictive representations can be learned through a minimalist algorithm that uses a single auxiliary loss, without the need for reward modeling or planning. The algorithm is validated on standard MDPs, MDPs with distractors, and POMDPs with sparse rewards, showing improved sample efficiency and robustness compared to existing methods. The paper also highlights the importance of using stop-gradient techniques in learning self-predictive representations, which helps avoid representational collapse. The findings provide a set of guidelines for RL practitioners, emphasizing the role of representation learning in improving policy optimization. The work bridges the gap between different approaches in RL and offers a principled analysis of state and history representation learning.
Reach us at info@study.space
Understanding Bridging State and History Representations%3A Understanding Self-Predictive RL