2025-01-13 | Siddarth Venkatraman, Moksh Jain, Luca Scimeca, Minsu Kim, Marcin Sendera, Mohsin Hasan, Luke Rowe, Sarthak Mittal, Pablo Lemos, Emmanuel Bengio, Alexandre Adam, Jarrid Rector-Brooks, Yoshua Bengio, Glen Berseth, Nikolay Malkin
This paper introduces a method called relative trajectory balance (RTB) for training diffusion models to generate unbiased posterior samples under a diffusion prior and an arbitrary reward function. Diffusion models are powerful generative models used in vision, language, and reinforcement learning. However, using them as priors in downstream tasks poses an intractable posterior inference problem. RTB addresses this by providing an asymptotically unbiased training objective that allows for efficient sampling from complex posterior distributions. The method is derived from the perspective of diffusion models as continuous generative flow networks, enabling the use of deep reinforcement learning techniques to improve mode coverage.
The paper demonstrates the versatility of RTB across various domains: in vision, it achieves competitive classifier-guided image generation and improves caption-conditioned generation under text-to-image foundation models. In language modeling, it reports strong results for infilling tasks with discrete diffusion language models. In continuous control, it achieves state-of-the-art results on benchmarks that leverage score-based behavior priors.
RTB is derived from the generative flow network perspective on diffusion models, which allows the use of deep reinforcement learning techniques to improve mode coverage. The method is applied to a variety of tasks, including vision, language, and continuous control, and is shown to outperform existing methods in terms of performance and efficiency. The paper also discusses the theoretical foundations of RTB, including its relation to trajectory balance and the use of off-policy training.
The method is evaluated on a range of tasks, including image generation, text infilling, and continuous control, and is shown to achieve state-of-the-art results. The paper also discusses the limitations of RTB, including the computational cost of simulation-based training and the potential for high variance in gradients. Overall, the paper presents a novel approach to training diffusion models for posterior inference under a diffusion prior, with broad applications across various domains.This paper introduces a method called relative trajectory balance (RTB) for training diffusion models to generate unbiased posterior samples under a diffusion prior and an arbitrary reward function. Diffusion models are powerful generative models used in vision, language, and reinforcement learning. However, using them as priors in downstream tasks poses an intractable posterior inference problem. RTB addresses this by providing an asymptotically unbiased training objective that allows for efficient sampling from complex posterior distributions. The method is derived from the perspective of diffusion models as continuous generative flow networks, enabling the use of deep reinforcement learning techniques to improve mode coverage.
The paper demonstrates the versatility of RTB across various domains: in vision, it achieves competitive classifier-guided image generation and improves caption-conditioned generation under text-to-image foundation models. In language modeling, it reports strong results for infilling tasks with discrete diffusion language models. In continuous control, it achieves state-of-the-art results on benchmarks that leverage score-based behavior priors.
RTB is derived from the generative flow network perspective on diffusion models, which allows the use of deep reinforcement learning techniques to improve mode coverage. The method is applied to a variety of tasks, including vision, language, and continuous control, and is shown to outperform existing methods in terms of performance and efficiency. The paper also discusses the theoretical foundations of RTB, including its relation to trajectory balance and the use of off-policy training.
The method is evaluated on a range of tasks, including image generation, text infilling, and continuous control, and is shown to achieve state-of-the-art results. The paper also discusses the limitations of RTB, including the computational cost of simulation-based training and the potential for high variance in gradients. Overall, the paper presents a novel approach to training diffusion models for posterior inference under a diffusion prior, with broad applications across various domains.