16 Jun 2016 | Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Tim Harley, Timothy P. Lillicrap, David Silver, Koray Kavukcuoglu
The paper introduces a lightweight framework for deep reinforcement learning that uses asynchronous gradient descent to optimize deep neural network controllers. It presents asynchronous variants of four standard reinforcement learning algorithms and demonstrates that parallel actor-learners stabilize the training process, allowing all methods to successfully train neural network controllers. The best-performing method, an asynchronous variant of actor-critic, outperforms state-of-the-art methods on the Atari domain while training on a single multi-core CPU for half the time compared to a GPU. The method also succeeds on continuous motor control tasks and a new task of navigating random 3D mazes using visual input. The framework offers practical benefits, such as reduced training time and the ability to use on-policy reinforcement learning methods with deep neural networks. The authors discuss the stability and scalability of the proposed methods, showing that they achieve substantial speedups with increasing numbers of parallel actor-learners. They also explore the robustness and stability of the algorithms, finding that they are stable and robust to learning rates and random initializations.The paper introduces a lightweight framework for deep reinforcement learning that uses asynchronous gradient descent to optimize deep neural network controllers. It presents asynchronous variants of four standard reinforcement learning algorithms and demonstrates that parallel actor-learners stabilize the training process, allowing all methods to successfully train neural network controllers. The best-performing method, an asynchronous variant of actor-critic, outperforms state-of-the-art methods on the Atari domain while training on a single multi-core CPU for half the time compared to a GPU. The method also succeeds on continuous motor control tasks and a new task of navigating random 3D mazes using visual input. The framework offers practical benefits, such as reduced training time and the ability to use on-policy reinforcement learning methods with deep neural networks. The authors discuss the stability and scalability of the proposed methods, showing that they achieve substantial speedups with increasing numbers of parallel actor-learners. They also explore the robustness and stability of the algorithms, finding that they are stable and robust to learning rates and random initializations.