Understanding Deep Reinforcement Learning with Double Q-Learning

The paper "Deep Reinforcement Learning with Double Q-learning" by Hado van Hasselt, Arthur Guez, and David Silver from Google DeepMind addresses the issue of overestimations in Q-learning, particularly in the context of deep neural networks. The authors show that the DQN algorithm, which combines Q-learning with deep neural networks, often suffers from significant overestimations in Atari 2600 games. They propose a modification to DQN called Double DQN, which generalizes the Double Q-learning algorithm to work with large-scale function approximation. This modification reduces overestimations and leads to better performance on several games. The paper also provides theoretical insights into why overestimations occur and demonstrates their negative impact on policy quality. Empirical results on Atari 2600 games show that Double DQN improves both the accuracy of value estimates and the stability of learning, achieving state-of-the-art results. The authors conclude that reducing overestimations is crucial for improving the quality of learned policies in reinforcement learning.The paper "Deep Reinforcement Learning with Double Q-learning" by Hado van Hasselt, Arthur Guez, and David Silver from Google DeepMind addresses the issue of overestimations in Q-learning, particularly in the context of deep neural networks. The authors show that the DQN algorithm, which combines Q-learning with deep neural networks, often suffers from significant overestimations in Atari 2600 games. They propose a modification to DQN called Double DQN, which generalizes the Double Q-learning algorithm to work with large-scale function approximation. This modification reduces overestimations and leads to better performance on several games. The paper also provides theoretical insights into why overestimations occur and demonstrates their negative impact on policy quality. Empirical results on Atari 2600 games show that Double DQN improves both the accuracy of value estimates and the stability of learning, achieving state-of-the-art results. The authors conclude that reducing overestimations is crucial for improving the quality of learned policies in reinforcement learning.

Deep Reinforcement Learning with Double Q-learning

8 Dec 2015 | Hado van Hasselt and Arthur Guez and David Silver