4 Jun 2019 | Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson
The paper introduces PlaNet, a model-based agent designed to learn environment dynamics from image observations and plan actions in latent space. PlaNet addresses the challenge of accurate dynamics modeling for planning in unknown environments, a long-standing issue in reinforcement learning. The agent uses a recurrent state-space model (RSSM) with both deterministic and stochastic transition components to capture multiple possible futures and improve long-term predictions. A novel variational inference objective, named latent overshooting, is proposed to encourage multi-step predictions in latent space. PlaNet outperforms model-free algorithms like A3C and D4PG on various continuous control tasks from pixels, achieving similar or better performance with significantly fewer episodes and comparable computation time. The key contributions include the design of a latent dynamics model, the introduction of latent overshooting, and the demonstration of PlaNet's effectiveness in solving challenging tasks.The paper introduces PlaNet, a model-based agent designed to learn environment dynamics from image observations and plan actions in latent space. PlaNet addresses the challenge of accurate dynamics modeling for planning in unknown environments, a long-standing issue in reinforcement learning. The agent uses a recurrent state-space model (RSSM) with both deterministic and stochastic transition components to capture multiple possible futures and improve long-term predictions. A novel variational inference objective, named latent overshooting, is proposed to encourage multi-step predictions in latent space. PlaNet outperforms model-free algorithms like A3C and D4PG on various continuous control tasks from pixels, achieving similar or better performance with significantly fewer episodes and comparable computation time. The key contributions include the design of a latent dynamics model, the introduction of latent overshooting, and the demonstration of PlaNet's effectiveness in solving challenging tasks.