16 Nov 2016 | Max Jaderberg*, Volodymyr Mnih*, Wojciech Marian Czarnecki*, Tom Schaul, Joel Z Leibo, David Silver & Koray Kavukcuoglu
This paper introduces the UNREAL agent, which combines reinforcement learning with unsupervised auxiliary tasks to improve performance in complex environments. The agent learns to maximize both extrinsic rewards and pseudo-rewards derived from the sensorimotor stream. By using auxiliary tasks, the agent develops a more robust and efficient representation of the environment, which helps it adapt quickly to the actual task. The UNREAL agent outperforms previous state-of-the-art methods on the Atari and Labyrinth domains, achieving 880% expert human performance on Atari and 87% on Labyrinth, with a 10× speedup in learning.
The agent uses a combination of auxiliary control and reward prediction tasks. Auxiliary control tasks involve learning to maximize changes in pixel intensity or activate specific features of the environment. Auxiliary reward prediction tasks involve predicting the reward that will be obtained in the next unobserved timestep. These tasks are used to focus the agent on important aspects of the task and improve the efficiency of learning.
The agent uses experience replay to provide additional updates and improve the stability of learning. The UNREAL agent also uses a shared representation between the base agent and the auxiliary tasks, allowing the base agent to learn to optimize extrinsic rewards more efficiently.
The UNREAL agent is evaluated on the Labyrinth and Atari domains, where it significantly outperforms the baseline A3C agent. The agent achieves higher performance, faster learning, and better robustness to hyperparameters. The results show that the UNREAL agent is more efficient and effective in learning complex tasks, making it a promising approach for reinforcement learning in real-world environments.This paper introduces the UNREAL agent, which combines reinforcement learning with unsupervised auxiliary tasks to improve performance in complex environments. The agent learns to maximize both extrinsic rewards and pseudo-rewards derived from the sensorimotor stream. By using auxiliary tasks, the agent develops a more robust and efficient representation of the environment, which helps it adapt quickly to the actual task. The UNREAL agent outperforms previous state-of-the-art methods on the Atari and Labyrinth domains, achieving 880% expert human performance on Atari and 87% on Labyrinth, with a 10× speedup in learning.
The agent uses a combination of auxiliary control and reward prediction tasks. Auxiliary control tasks involve learning to maximize changes in pixel intensity or activate specific features of the environment. Auxiliary reward prediction tasks involve predicting the reward that will be obtained in the next unobserved timestep. These tasks are used to focus the agent on important aspects of the task and improve the efficiency of learning.
The agent uses experience replay to provide additional updates and improve the stability of learning. The UNREAL agent also uses a shared representation between the base agent and the auxiliary tasks, allowing the base agent to learn to optimize extrinsic rewards more efficiently.
The UNREAL agent is evaluated on the Labyrinth and Atari domains, where it significantly outperforms the baseline A3C agent. The agent achieves higher performance, faster learning, and better robustness to hyperparameters. The results show that the UNREAL agent is more efficient and effective in learning complex tasks, making it a promising approach for reinforcement learning in real-world environments.