17 Mar 2020 | Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi
Dreamer is a reinforcement learning agent designed to solve long-horizon tasks from images using latent imagination. It learns behaviors by propagating analytic gradients of learned state values through trajectories imagined in the compact state space of a world model. The key contributions of Dreamer include:
1. **Learning Long-Horizon Behaviors**: Dreamer efficiently learns behaviors by predicting both actions and state values in the latent space, allowing it to consider rewards beyond the imagination horizon.
2. **Empirical Performance**: On 20 challenging visual control tasks, Dreamer outperforms existing approaches in data-efficiency, computation time, and final performance.
3. **World Model**: Dreamer uses a latent dynamics model that predicts future rewards given actions and past observations, enabling efficient long-term predictions and parallel trajectory generation.
4. **Actor-Critic Algorithm**: Dreamer employs an actor-critic approach to learn behaviors, optimizing a parametric policy by propagating analytic gradients of multi-step values back through the learned latent dynamics.
The paper also discusses the components of Dreamer, including the latent dynamics model, action and value models, and the learning objective. It compares Dreamer to existing methods and evaluates its performance on various control tasks, demonstrating its effectiveness in solving tasks with long horizons, continuous actions, discrete actions, and early termination.Dreamer is a reinforcement learning agent designed to solve long-horizon tasks from images using latent imagination. It learns behaviors by propagating analytic gradients of learned state values through trajectories imagined in the compact state space of a world model. The key contributions of Dreamer include:
1. **Learning Long-Horizon Behaviors**: Dreamer efficiently learns behaviors by predicting both actions and state values in the latent space, allowing it to consider rewards beyond the imagination horizon.
2. **Empirical Performance**: On 20 challenging visual control tasks, Dreamer outperforms existing approaches in data-efficiency, computation time, and final performance.
3. **World Model**: Dreamer uses a latent dynamics model that predicts future rewards given actions and past observations, enabling efficient long-term predictions and parallel trajectory generation.
4. **Actor-Critic Algorithm**: Dreamer employs an actor-critic approach to learn behaviors, optimizing a parametric policy by propagating analytic gradients of multi-step values back through the learned latent dynamics.
The paper also discusses the components of Dreamer, including the latent dynamics model, action and value models, and the learning objective. It compares Dreamer to existing methods and evaluates its performance on various control tasks, demonstrating its effectiveness in solving tasks with long horizons, continuous actions, discrete actions, and early termination.