4 Jul 2016 | Ian Osband1,2, Charles Blundell2, Alexander Pritzel2, Benjamin Van Roy1
The paper introduces *bootstrapped DQN*, an algorithm designed to address the challenge of efficient exploration in reinforcement learning (RL). Traditional dithering strategies, such as $\epsilon$-greedy, do not perform deep exploration, leading to exponential increases in data requirements. Bootstrapped DQN combines deep exploration with deep neural networks, achieving significantly faster learning compared to any dithering strategy. The algorithm leverages random initialization to produce reasonable uncertainty estimates for neural networks at low computational cost. In the Arcade Learning Environment, bootstrapped DQN demonstrates substantial improvements in learning speed and cumulative performance across most games. The paper also discusses the theoretical foundations of deep exploration and the advantages of using randomized value functions, which are compatible with complex nonlinear value function representations. Experimental results show that bootstrapped DQN outperforms DQN in terms of learning speed and cumulative rewards, highlighting its effectiveness in large-scale problems.The paper introduces *bootstrapped DQN*, an algorithm designed to address the challenge of efficient exploration in reinforcement learning (RL). Traditional dithering strategies, such as $\epsilon$-greedy, do not perform deep exploration, leading to exponential increases in data requirements. Bootstrapped DQN combines deep exploration with deep neural networks, achieving significantly faster learning compared to any dithering strategy. The algorithm leverages random initialization to produce reasonable uncertainty estimates for neural networks at low computational cost. In the Arcade Learning Environment, bootstrapped DQN demonstrates substantial improvements in learning speed and cumulative performance across most games. The paper also discusses the theoretical foundations of deep exploration and the advantages of using randomized value functions, which are compatible with complex nonlinear value function representations. Experimental results show that bootstrapped DQN outperforms DQN in terms of learning speed and cumulative rewards, highlighting its effectiveness in large-scale problems.