16 Nov 2016 | Max Jaderberg*, Volodymyr Mnih*, Wojciech Marian Czarnecki*, Tom Schaul, Joel Z Leibo, David Silver & Koray Kavukcuoglu
This paper introduces an agent that maximizes multiple pseudo-reward functions simultaneously through reinforcement learning, sharing a common representation that develops in the absence of extrinsic rewards. The agent also includes a novel mechanism to focus this representation on extrinsic rewards, enabling rapid adaptation to relevant aspects of the task. The UNREAL agent outperforms previous state-of-the-art methods on Atari games, achieving 880% of expert human performance, and on challenging 3D Labyrinth tasks, with a mean speedup of 10× and 87% expert human performance. The paper combines the Asynchronous Advantage Actor-Critic (A3C) framework with auxiliary control and reward prediction tasks, using experience replay to improve efficiency and stability. The UNREAL agent demonstrates superior performance, robustness to hyperparameters, and data efficiency compared to vanilla A3C.This paper introduces an agent that maximizes multiple pseudo-reward functions simultaneously through reinforcement learning, sharing a common representation that develops in the absence of extrinsic rewards. The agent also includes a novel mechanism to focus this representation on extrinsic rewards, enabling rapid adaptation to relevant aspects of the task. The UNREAL agent outperforms previous state-of-the-art methods on Atari games, achieving 880% of expert human performance, and on challenging 3D Labyrinth tasks, with a mean speedup of 10× and 87% expert human performance. The paper combines the Asynchronous Advantage Actor-Critic (A3C) framework with auxiliary control and reward prediction tasks, using experience replay to improve efficiency and stability. The UNREAL agent demonstrates superior performance, robustness to hyperparameters, and data efficiency compared to vanilla A3C.