Rethinking Score Distillation as a Bridge Between Image Distributions

Rethinking Score Distillation as a Bridge Between Image Distributions

13 Jun 2024 | David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A Efros, Aleksander Holynski, Angjoo Kanazawa
This paper presents a novel analysis of score distillation sampling (SDS) as a method for bridging image distributions, identifying two main sources of error: (1) linear approximation of the optimal transport path and (2) poor estimation of the source distribution. The authors propose a method that uses textual descriptions to better approximate the source distribution, leading to improved results without additional computational overhead. This approach is shown to be effective across various domains, including text-to-2D generation, text-based NeRF optimization, painting-to-real image translation, optical illusion generation, and 3D sketch-to-real. The method is compared to existing SDS variants and shown to produce high-frequency details with realistic colors. The analysis also reveals that current SDS methods suffer from artifacts such as oversaturation and oversmoothing due to these errors. The proposed method, which uses text to describe the source distribution, is shown to produce results comparable to VSD without its computational overhead. The paper also discusses the implications of using pre-trained diffusion models as priors in optimization tasks and highlights the potential social impacts of such technology.This paper presents a novel analysis of score distillation sampling (SDS) as a method for bridging image distributions, identifying two main sources of error: (1) linear approximation of the optimal transport path and (2) poor estimation of the source distribution. The authors propose a method that uses textual descriptions to better approximate the source distribution, leading to improved results without additional computational overhead. This approach is shown to be effective across various domains, including text-to-2D generation, text-based NeRF optimization, painting-to-real image translation, optical illusion generation, and 3D sketch-to-real. The method is compared to existing SDS variants and shown to produce high-frequency details with realistic colors. The analysis also reveals that current SDS methods suffer from artifacts such as oversaturation and oversmoothing due to these errors. The proposed method, which uses text to describe the source distribution, is shown to produce results comparable to VSD without its computational overhead. The paper also discusses the implications of using pre-trained diffusion models as priors in optimization tasks and highlights the potential social impacts of such technology.
Reach us at info@study.space