A Distributional Perspective on Reinforcement Learning

A Distributional Perspective on Reinforcement Learning

2017 | Marc G. Bellemare * 1 Will Dabney * 1 Rémi Munos 1
This paper argues for the importance of the value distribution in reinforcement learning (RL), contrasting it with the common approach of modeling the expected return. The value distribution, defined as the distribution of the random return, is shown to be crucial for learning in RL. The paper presents theoretical results showing that the distributional Bellman operator is a contraction in the Wasserstein metric, leading to stable learning. In contrast, the control setting exhibits instability, and the optimality operator is not a contraction in any distribution metric. The paper proposes a new algorithm based on the distributional Bellman equation, which learns approximate value distributions. This approach is evaluated on the Arcade Learning Environment (ALE), where it achieves state-of-the-art results. The algorithm uses a parametric distribution to approximate the value distribution, and a projected Bellman update to handle the distributional nature of the problem. The results show that learning the full value distribution leads to better performance and more stable learning compared to learning the expected return. The paper also highlights the importance of considering the distribution of returns in RL, as it provides a more accurate representation of the learning process and can lead to better performance in complex environments.This paper argues for the importance of the value distribution in reinforcement learning (RL), contrasting it with the common approach of modeling the expected return. The value distribution, defined as the distribution of the random return, is shown to be crucial for learning in RL. The paper presents theoretical results showing that the distributional Bellman operator is a contraction in the Wasserstein metric, leading to stable learning. In contrast, the control setting exhibits instability, and the optimality operator is not a contraction in any distribution metric. The paper proposes a new algorithm based on the distributional Bellman equation, which learns approximate value distributions. This approach is evaluated on the Arcade Learning Environment (ALE), where it achieves state-of-the-art results. The algorithm uses a parametric distribution to approximate the value distribution, and a projected Bellman update to handle the distributional nature of the problem. The results show that learning the full value distribution leads to better performance and more stable learning compared to learning the expected return. The paper also highlights the importance of considering the distribution of returns in RL, as it provides a more accurate representation of the learning process and can lead to better performance in complex environments.
Reach us at info@study.space