30 Oct 2024 | Eloi Alonso*, Adam Jelley*, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce‡, François Fleuret‡
The paper introduces DIAMOND (Diffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained within a diffusion world model. The authors highlight the importance of visual details in reinforcement learning and argue that recent world models, which often compress environment dynamics into discrete latent variables, may overlook crucial visual information. Diffusion models, known for their ability to generate high-resolution images by reversing a noising process, are proposed as a solution to this issue. DIAMOND is trained using a diffusion model that can condition on past observations and actions, allowing it to generate more accurate and detailed trajectories. The model achieves a mean human-normalized score of 1.46 on the Atari 100k benchmark, outperforming other world model baselines. The paper also demonstrates that DIAMOND's diffusion world model can be used as an interactive neural game engine, successfully modeling 3D environments like Counter-Strike: Global Offensive. The authors discuss the limitations and future directions for improving the model, emphasizing the need for more faithful and interactive world models to enhance sample efficiency and safety in reinforcement learning.The paper introduces DIAMOND (Diffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained within a diffusion world model. The authors highlight the importance of visual details in reinforcement learning and argue that recent world models, which often compress environment dynamics into discrete latent variables, may overlook crucial visual information. Diffusion models, known for their ability to generate high-resolution images by reversing a noising process, are proposed as a solution to this issue. DIAMOND is trained using a diffusion model that can condition on past observations and actions, allowing it to generate more accurate and detailed trajectories. The model achieves a mean human-normalized score of 1.46 on the Atari 100k benchmark, outperforming other world model baselines. The paper also demonstrates that DIAMOND's diffusion world model can be used as an interactive neural game engine, successfully modeling 3D environments like Counter-Strike: Global Offensive. The authors discuss the limitations and future directions for improving the model, emphasizing the need for more faithful and interactive world models to enhance sample efficiency and safety in reinforcement learning.