[slides] Recurrent World Models Facilitate Policy Evolution

The paper introduces a generative recurrent neural network (RNN) that models popular reinforcement learning (RL) environments through compressed spatio-temporal representations. The extracted features from the world model are used to train simple policies via evolution, achieving state-of-the-art results in various environments. The authors also explore training the agent entirely within a generated environment created by its own internal world model and transferring this policy back to the actual environment. They address the issue of the agent exploiting imperfections in the generated environment by adjusting a temperature parameter in the world model to control the uncertainty of the generated environments. The approach is demonstrated in two experiments: a car racing task and a VizDoom environment, where the agent learns to navigate and avoid obstacles, respectively. The results show that the agent can solve challenging tasks and transfer its policy to the real environment with high performance. The paper discusses related work and future directions, including the use of higher-capacity models and the incorporation of artificial curiosity and intrinsic motivation.The paper introduces a generative recurrent neural network (RNN) that models popular reinforcement learning (RL) environments through compressed spatio-temporal representations. The extracted features from the world model are used to train simple policies via evolution, achieving state-of-the-art results in various environments. The authors also explore training the agent entirely within a generated environment created by its own internal world model and transferring this policy back to the actual environment. They address the issue of the agent exploiting imperfections in the generated environment by adjusting a temperature parameter in the world model to control the uncertainty of the generated environments. The approach is demonstrated in two experiments: a car racing task and a VizDoom environment, where the agent learns to navigate and avoid obstacles, respectively. The results show that the agent can solve challenging tasks and transfer its policy to the real environment with high performance. The paper discusses related work and future directions, including the use of higher-capacity models and the incorporation of artificial curiosity and intrinsic motivation.

Recurrent World Models Facilitate Policy Evolution

4 Sep 2018 | David Ha, Jürgen Schmidhuber