Implicit Diffusion: Efficient Optimization through Stochastic Sampling

Implicit Diffusion: Efficient Optimization through Stochastic Sampling

May 22, 2024 | Pierre Marion, Anna Korba, Peter Bartlett, Mathieu Blondel, Valentin De Bortoli, Arnaud Doucet, Felipe Llinares-López, Courtney Paquette, Quentin Berthet
Implicit Diffusion is a novel algorithm for optimizing distributions defined implicitly by parameterized stochastic diffusions. The method allows modifying the outcome distribution of sampling processes by optimizing over their parameters. It introduces a general framework for first-order optimization of these processes, performing joint optimization and sampling in a single loop. Inspired by bilevel optimization and automatic implicit differentiation, it views sampling as optimization over the space of probability distributions. The method provides theoretical guarantees and experimental results showing its effectiveness, applied to training energy-based models and fine-tuning denoising diffusions. Sampling from a target distribution is central to many machine learning, optimization, and statistical methods. Sampling algorithms increasingly rely on iteratively applying large-scale parameterized functions, such as neural networks, as in denoising diffusion models. This iterative process implicitly maps a parameter θ to a distribution π* (θ). The paper focuses on optimization problems over these implicitly parameterized distributions. The main problem is to minimize a function F over the distribution π* (θ), which encompasses learning parameterized Langevin diffusions and contrastive learning of energy-based models. The paper introduces the Implicit Diffusion algorithm, which allows optimizing through sampling by jointly performing optimization and sampling in a single loop. It provides theoretical guarantees in both continuous and discrete time settings and demonstrates its performance in experimental settings, including fine-tuning denoising diffusions and training energy-based models. The algorithm addresses the challenge of computing gradients of functions of the target distribution with respect to the parameter, which requires differentiating through a sampling operation. It leverages the perspective of sampling as optimization over the space of probability distributions, drawing a link between optimization through stochastic sampling and bilevel optimization. The paper discusses various examples, including Langevin dynamics and denoising diffusion, and presents methods for gradient estimation through sampling. It also introduces the Implicit Diffusion optimization algorithm, which circumvents solving the inner sampling problem by maintaining a single dynamic of probabilities. The algorithm is shown to be efficient and effective in both continuous and discrete time settings. Theoretical analysis is provided for continuous and discrete Langevin and denoising diffusion cases. The paper demonstrates convergence results for the algorithm under certain assumptions, showing that it can efficiently optimize through sampling. Experimental results show that the algorithm outperforms nested-loop approaches in terms of gradient evaluations and computational efficiency. The method is applied to reward training of Langevin processes and denoising diffusion models, demonstrating its effectiveness in various scenarios.Implicit Diffusion is a novel algorithm for optimizing distributions defined implicitly by parameterized stochastic diffusions. The method allows modifying the outcome distribution of sampling processes by optimizing over their parameters. It introduces a general framework for first-order optimization of these processes, performing joint optimization and sampling in a single loop. Inspired by bilevel optimization and automatic implicit differentiation, it views sampling as optimization over the space of probability distributions. The method provides theoretical guarantees and experimental results showing its effectiveness, applied to training energy-based models and fine-tuning denoising diffusions. Sampling from a target distribution is central to many machine learning, optimization, and statistical methods. Sampling algorithms increasingly rely on iteratively applying large-scale parameterized functions, such as neural networks, as in denoising diffusion models. This iterative process implicitly maps a parameter θ to a distribution π* (θ). The paper focuses on optimization problems over these implicitly parameterized distributions. The main problem is to minimize a function F over the distribution π* (θ), which encompasses learning parameterized Langevin diffusions and contrastive learning of energy-based models. The paper introduces the Implicit Diffusion algorithm, which allows optimizing through sampling by jointly performing optimization and sampling in a single loop. It provides theoretical guarantees in both continuous and discrete time settings and demonstrates its performance in experimental settings, including fine-tuning denoising diffusions and training energy-based models. The algorithm addresses the challenge of computing gradients of functions of the target distribution with respect to the parameter, which requires differentiating through a sampling operation. It leverages the perspective of sampling as optimization over the space of probability distributions, drawing a link between optimization through stochastic sampling and bilevel optimization. The paper discusses various examples, including Langevin dynamics and denoising diffusion, and presents methods for gradient estimation through sampling. It also introduces the Implicit Diffusion optimization algorithm, which circumvents solving the inner sampling problem by maintaining a single dynamic of probabilities. The algorithm is shown to be efficient and effective in both continuous and discrete time settings. Theoretical analysis is provided for continuous and discrete Langevin and denoising diffusion cases. The paper demonstrates convergence results for the algorithm under certain assumptions, showing that it can efficiently optimize through sampling. Experimental results show that the algorithm outperforms nested-loop approaches in terms of gradient evaluations and computational efficiency. The method is applied to reward training of Langevin processes and denoising diffusion models, demonstrating its effectiveness in various scenarios.
Reach us at info@study.space