DREAM TO CONTROL: LEARNING BEHAVIORS BY LATENT IMAGINATION

DREAM TO CONTROL: LEARNING BEHAVIORS BY LATENT IMAGINATION

17 Mar 2020 | Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi
Dreamer is a reinforcement learning agent that learns long-horizon behaviors from images using latent imagination. It efficiently learns behaviors by propagating analytic gradients of learned state values back through trajectories imagined in the compact state space of a learned world model. On 20 challenging visual control tasks, Dreamer outperforms existing approaches in data-efficiency, computation time, and final performance. The agent uses a latent dynamics model to predict future rewards and actions in a compact state space, enabling efficient policy optimization. Dreamer learns both action and value models in the latent space of the world model, with the value model optimizing Bellman consistency for imagined rewards and the action model maximizing the values by propagating their analytic gradients back through the dynamics. The agent uses a combination of reward prediction, image reconstruction, and contrastive estimation for representation learning. Dreamer is evaluated on the DeepMind Control Suite and shows superior performance compared to existing methods, including PlaNet and D4PG. The results demonstrate that learning behaviors through latent imagination with world models can outperform top methods based on experience replay. Dreamer is applicable to tasks with discrete actions and early termination, and future research on representation learning is likely to improve its performance.Dreamer is a reinforcement learning agent that learns long-horizon behaviors from images using latent imagination. It efficiently learns behaviors by propagating analytic gradients of learned state values back through trajectories imagined in the compact state space of a learned world model. On 20 challenging visual control tasks, Dreamer outperforms existing approaches in data-efficiency, computation time, and final performance. The agent uses a latent dynamics model to predict future rewards and actions in a compact state space, enabling efficient policy optimization. Dreamer learns both action and value models in the latent space of the world model, with the value model optimizing Bellman consistency for imagined rewards and the action model maximizing the values by propagating their analytic gradients back through the dynamics. The agent uses a combination of reward prediction, image reconstruction, and contrastive estimation for representation learning. Dreamer is evaluated on the DeepMind Control Suite and shows superior performance compared to existing methods, including PlaNet and D4PG. The results demonstrate that learning behaviors through latent imagination with world models can outperform top methods based on experience replay. Dreamer is applicable to tasks with discrete actions and early termination, and future research on representation learning is likely to improve its performance.
Reach us at info@study.space
[slides and audio] Dream to Control%3A Learning Behaviors by Latent Imagination