PRIORITIZED EXPERIENCE REPLAY

PRIORITIZED EXPERIENCE REPLAY

25 Feb 2016 | Tom Schaul, John Quan, Ioannis Antonoglou and David Silver
This paper introduces prioritized experience replay, a method that enhances the efficiency of learning from experience replay in reinforcement learning. The key idea is to replay transitions with higher expected learning progress more frequently, measured by the magnitude of their temporal-difference (TD) error. This approach improves learning efficiency by focusing on transitions that are more informative, leading to faster convergence and better performance. In the Deep Q-Network (DQN) algorithm, which achieved human-level performance in many Atari games, prioritized experience replay outperforms uniform replay on 41 out of 49 games. The method uses TD-error to prioritize transitions, and introduces stochastic prioritization to maintain diversity and reduce bias through importance sampling. The algorithm is robust and scalable, as demonstrated on the Atari 2600 benchmark suite, where it achieves faster learning and state-of-the-art performance. The paper also discusses the benefits of prioritized replay in various scenarios, including the 'Blind Cliffwalk' environment, where prioritized replay significantly reduces the number of learning steps required. The method is extended to other domains, including supervised learning and off-policy reinforcement learning, where it improves performance by focusing on informative samples and reducing variance. The results show that prioritized replay leads to a substantial improvement in performance on 41 out of 49 Atari games, with the median normalized performance increasing from 48% to 106%. The method is also effective in reducing the delay until performance gets off the ground in games that otherwise suffer from such a delay. The algorithm is implemented in a full-scale reinforcement learning agent based on the Double DQN algorithm, with prioritized replay replacing uniform random sampling. The paper concludes that prioritized replay is a valuable technique for improving learning efficiency in reinforcement learning, and that further extensions hold promise for class-imbalanced supervised learning and other applications.This paper introduces prioritized experience replay, a method that enhances the efficiency of learning from experience replay in reinforcement learning. The key idea is to replay transitions with higher expected learning progress more frequently, measured by the magnitude of their temporal-difference (TD) error. This approach improves learning efficiency by focusing on transitions that are more informative, leading to faster convergence and better performance. In the Deep Q-Network (DQN) algorithm, which achieved human-level performance in many Atari games, prioritized experience replay outperforms uniform replay on 41 out of 49 games. The method uses TD-error to prioritize transitions, and introduces stochastic prioritization to maintain diversity and reduce bias through importance sampling. The algorithm is robust and scalable, as demonstrated on the Atari 2600 benchmark suite, where it achieves faster learning and state-of-the-art performance. The paper also discusses the benefits of prioritized replay in various scenarios, including the 'Blind Cliffwalk' environment, where prioritized replay significantly reduces the number of learning steps required. The method is extended to other domains, including supervised learning and off-policy reinforcement learning, where it improves performance by focusing on informative samples and reducing variance. The results show that prioritized replay leads to a substantial improvement in performance on 41 out of 49 Atari games, with the median normalized performance increasing from 48% to 106%. The method is also effective in reducing the delay until performance gets off the ground in games that otherwise suffer from such a delay. The algorithm is implemented in a full-scale reinforcement learning agent based on the Double DQN algorithm, with prioritized replay replacing uniform random sampling. The paper concludes that prioritized replay is a valuable technique for improving learning efficiency in reinforcement learning, and that further extensions hold promise for class-imbalanced supervised learning and other applications.
Reach us at info@study.space
[slides and audio] Prioritized Experience Replay