World Models

World Models

9 May 2018 | David Ha, Jürgen Schmidhuber
The paper explores the development of generative neural network models, known as *world models*, to represent popular reinforcement learning environments. These world models are trained unsupervised to learn compressed spatial and temporal representations of the environment. By using features extracted from the world model as inputs to an agent, a compact and simple policy can be trained to solve tasks. The agent can even be trained entirely within its own hallucinated dream generated by the world model, and the learned policy can be transferred back to the actual environment. The authors draw parallels between human cognitive processes and the predictive models used in reinforcement learning. They argue that humans develop mental models of the world based on limited sensory inputs, which they use to make decisions and actions. Similarly, in reinforcement learning, agents benefit from having a good representation of past and present states, as well as a predictive model of the future. The paper presents a simplified framework for training large RNN-based agents, where the agent is divided into a large world model and a small controller model. The world model is trained to learn a compressed representation of the environment, while the controller model is trained to perform tasks using this representation. This approach allows for efficient training of large models and focuses the credit assignment problem on a smaller search space. The authors demonstrate the effectiveness of their method through experiments on the Car Racing and VizDoom environments. In the Car Racing task, the agent achieves a score of 906 over 100 random trials, solving the task and obtaining state-of-the-art results. In the VizDoom experiment, the agent learns to avoid fireballs and achieves a score of 1100 time steps over 100 random trials, significantly outperforming previous methods. The paper also discusses the challenges and limitations of the approach, including the need for iterative training procedures to handle more complex tasks and the potential for adversarial policies that exploit the world model. The authors suggest future directions, such as incorporating artificial curiosity and intrinsic motivation to encourage novel exploration.The paper explores the development of generative neural network models, known as *world models*, to represent popular reinforcement learning environments. These world models are trained unsupervised to learn compressed spatial and temporal representations of the environment. By using features extracted from the world model as inputs to an agent, a compact and simple policy can be trained to solve tasks. The agent can even be trained entirely within its own hallucinated dream generated by the world model, and the learned policy can be transferred back to the actual environment. The authors draw parallels between human cognitive processes and the predictive models used in reinforcement learning. They argue that humans develop mental models of the world based on limited sensory inputs, which they use to make decisions and actions. Similarly, in reinforcement learning, agents benefit from having a good representation of past and present states, as well as a predictive model of the future. The paper presents a simplified framework for training large RNN-based agents, where the agent is divided into a large world model and a small controller model. The world model is trained to learn a compressed representation of the environment, while the controller model is trained to perform tasks using this representation. This approach allows for efficient training of large models and focuses the credit assignment problem on a smaller search space. The authors demonstrate the effectiveness of their method through experiments on the Car Racing and VizDoom environments. In the Car Racing task, the agent achieves a score of 906 over 100 random trials, solving the task and obtaining state-of-the-art results. In the VizDoom experiment, the agent learns to avoid fireballs and achieves a score of 1100 time steps over 100 random trials, significantly outperforming previous methods. The paper also discusses the challenges and limitations of the approach, including the need for iterative training procedures to handle more complex tasks and the potential for adversarial policies that exploit the world model. The authors suggest future directions, such as incorporating artificial curiosity and intrinsic motivation to encourage novel exploration.
Reach us at info@study.space