Understanding Improved off-policy training of diffusion samplers

This paper addresses the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. It benchmarks several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). The authors propose a novel exploration strategy for off-policy methods, based on local search in the target space with a replay buffer, which improves sample quality across various target distributions. The paper also introduces a unified library for diffusion-structured samplers, focusing on off-policy methods but including simulation-based variational objectives. The library facilitates benchmarking and reproducibility, and the authors provide new modeling and training techniques, including the Langevin parametrization and off-policy exploration methods. The results highlight the importance of off-policy exploration and improved credit assignment in diffusion-structured samplers, and suggest that the Langevin parametrization and local search with a replay buffer are effective techniques for enhancing sample quality and mode coverage.This paper addresses the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. It benchmarks several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). The authors propose a novel exploration strategy for off-policy methods, based on local search in the target space with a replay buffer, which improves sample quality across various target distributions. The paper also introduces a unified library for diffusion-structured samplers, focusing on off-policy methods but including simulation-based variational objectives. The library facilitates benchmarking and reproducibility, and the authors provide new modeling and training techniques, including the Langevin parametrization and off-policy exploration methods. The results highlight the importance of off-policy exploration and improved credit assignment in diffusion-structured samplers, and suggest that the Langevin parametrization and local search with a replay buffer are effective techniques for enhancing sample quality and mode coverage.

Improved off-policy training of diffusion samplers

26 May 2024 | Marcin Sendera, Minsu Kim, Sarthak Mittal, Pablo Lemos, Luca Scimeca, Jarrid Rector-Brooks, Alexandre Adam, Yoshua Bengio, Nikolay Malkin