26 May 2024 | Marcin Sendera, Minsu Kim, Sarthak Mittal, Pablo Lemos, Luca Scimeca, Jarrid Rector-Brooks, Alexandre Adam, Yoshua Bengio, Nikolay Malkin
This paper presents a study on improving off-policy training of diffusion samplers for amortized inference. The authors evaluate various diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods like continuous generative flow networks (GFlowNets). They propose a novel exploration strategy for off-policy methods using local search in the target space with a replay buffer, which improves sample quality across various target distributions. The authors also introduce a unified library for diffusion-structured samplers, which includes both off-policy methods and simulation-based variational objectives.
The paper discusses the challenges of training diffusion models to sample from a distribution with a given unnormalized density or energy function. It highlights the importance of exploration and credit assignment in diffusion-structured samplers. The authors propose a new method for off-policy exploration using local search and a replay buffer, which significantly improves performance in most tasks and metrics. They also explore the use of Langevin parametrization and subtrajectory balance objectives for improving credit assignment.
The authors conduct experiments on various tasks, including sampling from energy distributions and conditional sampling from the latent posterior of a variational autoencoder (VAE). They compare the performance of different algorithms, including traditional sampling methods, simulation-driven variational approaches, and diffusion-based GFlowNet samplers. The results show that the proposed methods, particularly the local search with replay buffer and the Langevin parametrization, significantly improve the quality of samples and the efficiency of the training process.
The paper also discusses the extension of the proposed methods to general SDE learning problems, including the ability to optimize the backward process and learn the forward process's diffusion rate. The authors conclude that their results suggest promising techniques for improving the mode coverage and efficiency of diffusion models, and they highlight the importance of understanding the continuous-time limit of all the algorithms studied.This paper presents a study on improving off-policy training of diffusion samplers for amortized inference. The authors evaluate various diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods like continuous generative flow networks (GFlowNets). They propose a novel exploration strategy for off-policy methods using local search in the target space with a replay buffer, which improves sample quality across various target distributions. The authors also introduce a unified library for diffusion-structured samplers, which includes both off-policy methods and simulation-based variational objectives.
The paper discusses the challenges of training diffusion models to sample from a distribution with a given unnormalized density or energy function. It highlights the importance of exploration and credit assignment in diffusion-structured samplers. The authors propose a new method for off-policy exploration using local search and a replay buffer, which significantly improves performance in most tasks and metrics. They also explore the use of Langevin parametrization and subtrajectory balance objectives for improving credit assignment.
The authors conduct experiments on various tasks, including sampling from energy distributions and conditional sampling from the latent posterior of a variational autoencoder (VAE). They compare the performance of different algorithms, including traditional sampling methods, simulation-driven variational approaches, and diffusion-based GFlowNet samplers. The results show that the proposed methods, particularly the local search with replay buffer and the Langevin parametrization, significantly improve the quality of samples and the efficiency of the training process.
The paper also discusses the extension of the proposed methods to general SDE learning problems, including the ability to optimize the backward process and learn the forward process's diffusion rate. The authors conclude that their results suggest promising techniques for improving the mode coverage and efficiency of diffusion models, and they highlight the importance of understanding the continuous-time limit of all the algorithms studied.