Feb 2024 | Ji-Hoon Park, Yeong-Joon Ju, Seong-Whan Lee
This paper presents a visual analysis approach to interpret the diffusion process of generative diffusion models, aiming to make the decision-making process of the model more human-understandable. The authors propose three research questions to interpret the diffusion process from the perspective of visual concepts generated by the model and the regions where the model attends in each time step. They develop tools for visualizing the diffusion process and answering these questions, enabling the diffusion process to be understood by humans. The authors show how the output is progressively generated in the diffusion process by explaining the level of denoising and highlighting relationships to foundational visual concepts at each time step through the results of experiments with various visual analyses using the tools. First, they rigorously examine spatial recovery levels to understand a model's focal region during denoising concerning semantic content and detailed levels. They illustrate that the denoising model initiates image recovery from the region containing semantic information and progresses toward the area with finer-grained details. Second, they explore how specific concepts are highlighted at each denoising step by aligning generated images with the prompts used to produce them. By observing the internal flow of the diffusion process, they show how a model strategically predicts a particular visual concept at each denoising step to complete the final image. Finally, they extend their analysis to decode the visual concepts embedded in all the time steps of the process. Throughout the training of the diffusion model, the model learns diverse visual concepts corresponding to each time-step, enabling the model to predict varying levels of visual concepts at different stages. They substantiate their tools using Area Under Cover (AUC) score, correlation quantification, and cross-attention mapping. Their findings provide insights into the diffusion process and pave the way for further research into explainable diffusion mechanisms.This paper presents a visual analysis approach to interpret the diffusion process of generative diffusion models, aiming to make the decision-making process of the model more human-understandable. The authors propose three research questions to interpret the diffusion process from the perspective of visual concepts generated by the model and the regions where the model attends in each time step. They develop tools for visualizing the diffusion process and answering these questions, enabling the diffusion process to be understood by humans. The authors show how the output is progressively generated in the diffusion process by explaining the level of denoising and highlighting relationships to foundational visual concepts at each time step through the results of experiments with various visual analyses using the tools. First, they rigorously examine spatial recovery levels to understand a model's focal region during denoising concerning semantic content and detailed levels. They illustrate that the denoising model initiates image recovery from the region containing semantic information and progresses toward the area with finer-grained details. Second, they explore how specific concepts are highlighted at each denoising step by aligning generated images with the prompts used to produce them. By observing the internal flow of the diffusion process, they show how a model strategically predicts a particular visual concept at each denoising step to complete the final image. Finally, they extend their analysis to decode the visual concepts embedded in all the time steps of the process. Throughout the training of the diffusion model, the model learns diverse visual concepts corresponding to each time-step, enabling the model to predict varying levels of visual concepts at different stages. They substantiate their tools using Area Under Cover (AUC) score, correlation quantification, and cross-attention mapping. Their findings provide insights into the diffusion process and pave the way for further research into explainable diffusion mechanisms.