Understanding Explaining generative diffusion models via visual analysis for interpretable decision-making process

Diffusion models have shown remarkable performance in generating high-fidelity images, but the interpretability of their denoising process remains challenging. To address this, the paper proposes three research questions to interpret the diffusion process from the perspective of visual concepts and the regions the model attends to at each time step. The authors develop tools for visualizing the diffusion process and answering these questions to make it more human-understandable. They examine spatial recovery levels to understand the model's focal region during denoising, focusing on semantic content and detailed levels. They also explore how specific concepts are highlighted at each denoising step by aligning generated images with the prompts used to produce them. Finally, they extend the analysis to decode the visual concepts embedded in all time steps of the process. The findings provide insights into the diffusion process and pave the way for further research into explainable diffusion mechanisms. The paper introduces DF-RISE and DF-CAM, two visualization tools, and evaluates their performance using AUC scores, correlation quantification, and cross-attention mapping. The results demonstrate that the proposed tools effectively interpret the diffusion process and enhance human understanding of the image generation process.Diffusion models have shown remarkable performance in generating high-fidelity images, but the interpretability of their denoising process remains challenging. To address this, the paper proposes three research questions to interpret the diffusion process from the perspective of visual concepts and the regions the model attends to at each time step. The authors develop tools for visualizing the diffusion process and answering these questions to make it more human-understandable. They examine spatial recovery levels to understand the model's focal region during denoising, focusing on semantic content and detailed levels. They also explore how specific concepts are highlighted at each denoising step by aligning generated images with the prompts used to produce them. Finally, they extend the analysis to decode the visual concepts embedded in all time steps of the process. The findings provide insights into the diffusion process and pave the way for further research into explainable diffusion mechanisms. The paper introduces DF-RISE and DF-CAM, two visualization tools, and evaluates their performance using AUC scores, correlation quantification, and cross-attention mapping. The results demonstrate that the proposed tools effectively interpret the diffusion process and enhance human understanding of the image generation process.

EXPLAINING GENERATIVE DIFFUSION MODELS VIA VISUAL ANALYSIS FOR INTERPRETABLE DECISION-MAKING PROCESS

16 Feb 2024 | Ji-Hoon Park, Yeong-Joon Ju, Seong-Whan Lee