Rethinking Score Distillation as a Bridge Between Image Distributions

Rethinking Score Distillation as a Bridge Between Image Distributions

13 Jun 2024 | David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A Efros, Aleksander Holynski, Angjoo Kanazawa
The paper "Rethinking Score Distillation as a Bridge Between Image Distributions" by David McAllister et al. addresses the limitations of Score Distillation Sampling (SDS) and its variants in generating high-quality images, particularly in data-poor domains. The authors propose a new framework that views SDS as solving an optimal-cost transport path from a source distribution to a target distribution, using the Schrödinger Bridge (SB) problem. This interpretation reveals two main sources of error: linear approximation of the optimal path and poor estimates of the source distribution. To mitigate these issues, the authors suggest using textual descriptions to calibrate the source distribution, which can significantly improve generation quality without increasing computational overhead. The proposed method is applied to various tasks, including text-to-2D generation, text-based NeRF optimization, painting-to-real image translation, optical illusion generation, and 3D sketch-to-real. The results show that the proposed method consistently outperforms or matches the performance of specialized methods like Variational Score Distillation (VSD) in terms of quality and efficiency. The paper also discusses the potential social impacts of using pre-trained diffusion models as priors in optimization frameworks, highlighting both positive and negative aspects.The paper "Rethinking Score Distillation as a Bridge Between Image Distributions" by David McAllister et al. addresses the limitations of Score Distillation Sampling (SDS) and its variants in generating high-quality images, particularly in data-poor domains. The authors propose a new framework that views SDS as solving an optimal-cost transport path from a source distribution to a target distribution, using the Schrödinger Bridge (SB) problem. This interpretation reveals two main sources of error: linear approximation of the optimal path and poor estimates of the source distribution. To mitigate these issues, the authors suggest using textual descriptions to calibrate the source distribution, which can significantly improve generation quality without increasing computational overhead. The proposed method is applied to various tasks, including text-to-2D generation, text-based NeRF optimization, painting-to-real image translation, optical illusion generation, and 3D sketch-to-real. The results show that the proposed method consistently outperforms or matches the performance of specialized methods like Variational Score Distillation (VSD) in terms of quality and efficiency. The paper also discusses the potential social impacts of using pre-trained diffusion models as priors in optimization frameworks, highlighting both positive and negative aspects.
Reach us at info@study.space