[slides and audio] Structure Matters%3A Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting

StrDiffusion is a structure-guided diffusion model for image inpainting that addresses the semantic discrepancy between masked and unmasked regions. The model reformulates the conventional texture denoising process under structure guidance to derive a simplified denoising objective. Key findings include: 1) semantically sparse structures help alleviate semantic discrepancy in the early denoising stages, while dense textures generate meaningful semantics in later stages; 2) unmasked regions provide time-dependent structure guidance for texture denoising. A structure-guided neural network is trained to estimate the simplified denoising objective by exploiting the consistency of denoised structure between masked and unmasked regions. An adaptive resampling strategy is also introduced to regulate semantic correlations. Extensive experiments on datasets like PSV, CelebA, and Places2 validate the effectiveness of StrDiffusion, showing superior performance in terms of PSNR, SSIM, and FID compared to state-of-the-art methods. The model achieves consistent and meaningful inpainting results by leveraging the time-dependent guidance of sparse structures and adaptive resampling strategies.StrDiffusion is a structure-guided diffusion model for image inpainting that addresses the semantic discrepancy between masked and unmasked regions. The model reformulates the conventional texture denoising process under structure guidance to derive a simplified denoising objective. Key findings include: 1) semantically sparse structures help alleviate semantic discrepancy in the early denoising stages, while dense textures generate meaningful semantics in later stages; 2) unmasked regions provide time-dependent structure guidance for texture denoising. A structure-guided neural network is trained to estimate the simplified denoising objective by exploiting the consistency of denoised structure between masked and unmasked regions. An adaptive resampling strategy is also introduced to regulate semantic correlations. Extensive experiments on datasets like PSV, CelebA, and Places2 validate the effectiveness of StrDiffusion, showing superior performance in terms of PSNR, SSIM, and FID compared to state-of-the-art methods. The model achieves consistent and meaningful inpainting results by leveraging the time-dependent guidance of sparse structures and adaptive resampling strategies.

Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting

1 Apr 2024 | Haipeng Liu, Yang Wang*, Biao Qian, Meng Wang, Yong Rui