19 Dec 2013 | Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
This paper presents a deep reinforcement learning model that successfully learns control policies directly from high-dimensional sensory input, such as raw pixels, using a convolutional neural network trained with a variant of Q-learning. The model is applied to seven Atari 2600 games from the Arcade Learning Environment (ALE), achieving superior performance compared to previous methods on six of the games and surpassing human experts on three. The model learns from raw video input, reward signals, and possible actions without any prior knowledge or hand-crafted features. It uses experience replay to address the challenges of correlated data and non-stationary distributions, and employs stochastic gradient descent for training. The network architecture and hyperparameters are consistent across all games, demonstrating the model's robustness. The approach combines deep learning with reinforcement learning, using a deep Q-network to estimate value functions and learn optimal actions. The model outperforms other methods in terms of performance and stability, achieving state-of-the-art results in six of the seven games tested. The results show that the model can learn complex strategies for playing Atari games, with the ability to generalize across different scenarios. The paper also discusses the challenges of reinforcement learning, including the need for large amounts of data, sparse and delayed rewards, and correlated data. The proposed method addresses these challenges through the use of experience replay and deep neural networks, demonstrating the effectiveness of deep reinforcement learning in complex environments.This paper presents a deep reinforcement learning model that successfully learns control policies directly from high-dimensional sensory input, such as raw pixels, using a convolutional neural network trained with a variant of Q-learning. The model is applied to seven Atari 2600 games from the Arcade Learning Environment (ALE), achieving superior performance compared to previous methods on six of the games and surpassing human experts on three. The model learns from raw video input, reward signals, and possible actions without any prior knowledge or hand-crafted features. It uses experience replay to address the challenges of correlated data and non-stationary distributions, and employs stochastic gradient descent for training. The network architecture and hyperparameters are consistent across all games, demonstrating the model's robustness. The approach combines deep learning with reinforcement learning, using a deep Q-network to estimate value functions and learn optimal actions. The model outperforms other methods in terms of performance and stability, achieving state-of-the-art results in six of the seven games tested. The results show that the model can learn complex strategies for playing Atari games, with the ability to generalize across different scenarios. The paper also discusses the challenges of reinforcement learning, including the need for large amounts of data, sparse and delayed rewards, and correlated data. The proposed method addresses these challenges through the use of experience replay and deep neural networks, demonstrating the effectiveness of deep reinforcement learning in complex environments.