16 Jun 2016 | Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Tim Harley, Timothy P. Lillicrap, David Silver, Koray Kavukcuoglu
This paper introduces a novel asynchronous framework for deep reinforcement learning (DRL) that enables the training of deep neural network controllers using asynchronous gradient descent. The framework presents asynchronous variants of four standard RL algorithms—Sarsa, Q-learning, n-step Q-learning, and actor-critic—and demonstrates that parallel actor-learners stabilize training, allowing all methods to successfully train neural network controllers. The best-performing method, asynchronous advantage actor-critic (A3C), outperforms the state-of-the-art on the Atari domain, achieving superior results in half the time on a single multi-core CPU instead of a GPU. A3C also excels in continuous motor control tasks and in navigating random 3D mazes using visual input.
The framework avoids the need for experience replay by leveraging parallel actor-learners with different exploration policies, which decorrelates data and stabilizes learning. This approach reduces training time and resource usage compared to previous methods that rely on GPUs or distributed architectures. The asynchronous framework is implemented on a single machine with a standard multi-core CPU, demonstrating practical benefits and scalability.
The paper evaluates the framework on various domains, including the Atari 2600 games, the TORCS 3D car racing simulator, the MuJoCo physics simulator for continuous action control, and the Labyrinth 3D maze environment. Results show that A3C achieves high performance across these domains, with significant improvements in training speed and data efficiency. The framework also demonstrates robustness to learning rates and initializations, with stable learning across a wide range of parameters.
The study highlights the effectiveness of asynchronous methods in DRL, showing that they can achieve state-of-the-art results without the need for specialized hardware. The framework's ability to scale with the number of parallel actor-learners and its efficiency in both training time and resource usage make it a promising approach for future DRL research. The paper also discusses potential improvements, including the integration of other reinforcement learning methods and advances in deep reinforcement learning with the asynchronous framework.This paper introduces a novel asynchronous framework for deep reinforcement learning (DRL) that enables the training of deep neural network controllers using asynchronous gradient descent. The framework presents asynchronous variants of four standard RL algorithms—Sarsa, Q-learning, n-step Q-learning, and actor-critic—and demonstrates that parallel actor-learners stabilize training, allowing all methods to successfully train neural network controllers. The best-performing method, asynchronous advantage actor-critic (A3C), outperforms the state-of-the-art on the Atari domain, achieving superior results in half the time on a single multi-core CPU instead of a GPU. A3C also excels in continuous motor control tasks and in navigating random 3D mazes using visual input.
The framework avoids the need for experience replay by leveraging parallel actor-learners with different exploration policies, which decorrelates data and stabilizes learning. This approach reduces training time and resource usage compared to previous methods that rely on GPUs or distributed architectures. The asynchronous framework is implemented on a single machine with a standard multi-core CPU, demonstrating practical benefits and scalability.
The paper evaluates the framework on various domains, including the Atari 2600 games, the TORCS 3D car racing simulator, the MuJoCo physics simulator for continuous action control, and the Labyrinth 3D maze environment. Results show that A3C achieves high performance across these domains, with significant improvements in training speed and data efficiency. The framework also demonstrates robustness to learning rates and initializations, with stable learning across a wide range of parameters.
The study highlights the effectiveness of asynchronous methods in DRL, showing that they can achieve state-of-the-art results without the need for specialized hardware. The framework's ability to scale with the number of parallel actor-learners and its efficiency in both training time and resource usage make it a promising approach for future DRL research. The paper also discusses potential improvements, including the integration of other reinforcement learning methods and advances in deep reinforcement learning with the asynchronous framework.