A Distributional Perspective on Reinforcement Learning

A Distributional Perspective on Reinforcement Learning

2017 | Marc G. Bellemare * 1 Will Dabney * 1 Rémi Munos 1
This paper advocates for a distributional perspective in reinforcement learning, focusing on the distribution of random returns rather than the expected value. The authors argue that this approach is crucial for understanding and improving approximate reinforcement learning. They present theoretical results showing that the distributional Bellman equation is a contraction in the Wasserstein metric for policy evaluation but not in the control setting, highlighting significant instability. To address this, they propose a new algorithm that learns approximate value distributions using Bellman's equation. The algorithm is evaluated on the Arcade Learning Environment, achieving state-of-the-art results and demonstrating the importance of the value distribution in approximate reinforcement learning. The paper concludes by discussing the benefits of the distributional perspective, including reduced chattering, better handling of state aliasing, and a richer set of predictions.This paper advocates for a distributional perspective in reinforcement learning, focusing on the distribution of random returns rather than the expected value. The authors argue that this approach is crucial for understanding and improving approximate reinforcement learning. They present theoretical results showing that the distributional Bellman equation is a contraction in the Wasserstein metric for policy evaluation but not in the control setting, highlighting significant instability. To address this, they propose a new algorithm that learns approximate value distributions using Bellman's equation. The algorithm is evaluated on the Arcade Learning Environment, achieving state-of-the-art results and demonstrating the importance of the value distribution in approximate reinforcement learning. The paper concludes by discussing the benefits of the distributional perspective, including reduced chattering, better handling of state aliasing, and a richer set of predictions.
Reach us at info@study.space