4 Jun 2019 | Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson
This paper introduces PlaNet, a model-based agent that learns latent dynamics from pixels and performs fast online planning in latent space to solve continuous control tasks. The agent uses a latent dynamics model with both deterministic and stochastic components to accurately predict rewards for multiple time steps. It also introduces a multi-step variational inference objective called latent overshooting, which improves long-term predictions and is compatible with any latent sequence model. PlaNet outperforms model-free algorithms in terms of performance and efficiency, requiring significantly fewer episodes and similar computation time. The agent solves complex tasks with sparse rewards, partial observability, and contact dynamics, which were previously challenging for planning with learned models. The key contributions include planning in latent spaces, a recurrent state space model with both deterministic and stochastic components, and latent overshooting. The agent uses a recurrent state-space model (RSSM) that allows for fast evaluation of action sequences in latent space. The model is trained using a variational bound that includes multi-step predictions, and the agent uses cross-entropy method (CEM) for planning. The experiments show that PlaNet outperforms other methods on six continuous control tasks from pixels, achieving performance comparable to the best model-free algorithms with significantly fewer episodes and similar computation time. The results demonstrate that learning latent dynamics models for planning in image domains is a promising approach.This paper introduces PlaNet, a model-based agent that learns latent dynamics from pixels and performs fast online planning in latent space to solve continuous control tasks. The agent uses a latent dynamics model with both deterministic and stochastic components to accurately predict rewards for multiple time steps. It also introduces a multi-step variational inference objective called latent overshooting, which improves long-term predictions and is compatible with any latent sequence model. PlaNet outperforms model-free algorithms in terms of performance and efficiency, requiring significantly fewer episodes and similar computation time. The agent solves complex tasks with sparse rewards, partial observability, and contact dynamics, which were previously challenging for planning with learned models. The key contributions include planning in latent spaces, a recurrent state space model with both deterministic and stochastic components, and latent overshooting. The agent uses a recurrent state-space model (RSSM) that allows for fast evaluation of action sequences in latent space. The model is trained using a variational bound that includes multi-step predictions, and the agent uses cross-entropy method (CEM) for planning. The experiments show that PlaNet outperforms other methods on six continuous control tasks from pixels, achieving performance comparable to the best model-free algorithms with significantly fewer episodes and similar computation time. The results demonstrate that learning latent dynamics models for planning in image domains is a promising approach.