[slides] On Mechanistic Knowledge Localization in Text-to-Image Generative Models

The paper "On Mechanistic Knowledge Localization in Text-to-Image Generative Models" explores the localization of knowledge within text-to-image models, particularly focusing on the UNet and CLIP text-encoder. Recent work using causal tracing has shown that early Stable-Diffusion variants primarily localize knowledge in the first layer of the CLIP text-encoder, while it diffuses throughout the UNet. However, this approach fails to pinpoint localized knowledge in more recent models like SD-XL and DeepFloyd, highlighting challenges in model editing. To address this issue, the authors introduce the concept of *mechanistic localization*, which identifies a small fraction of layers in the UNet that control various visual attributes such as "style," "objects," and "facts." They propose LocoGen, a method that measures the direct effect of intermediate layers on output generation by performing interventions in the cross-attention layers of the UNet. LocoGen is then used with LocoEDIT, a closed-form editing method, to edit popular open-source text-to-image models, including SD-XL. The paper demonstrates that LocoGen can universally identify layers controlling visual attributes across different models, and LocoEDIT effectively edits these layers to remove styles, modify objects, and update facts. Notably, for certain attributes like "style," knowledge can be traced and edited to a subset of neurons, highlighting the potential for neuron-level model editing. The contributions of the paper include: - Highlighting the limitations of existing interpretability methods like causal tracing for recent text-to-image models. - Introducing LocoGen for universal knowledge localization. - Showcasing the effectiveness of LocoEDIT for model editing across various text-to-image models. The paper also includes empirical results and a human-study to validate the effectiveness of LocoGen and LocoEDIT, demonstrating their ability to accurately identify and edit specific visual attributes in text-to-image models.The paper "On Mechanistic Knowledge Localization in Text-to-Image Generative Models" explores the localization of knowledge within text-to-image models, particularly focusing on the UNet and CLIP text-encoder. Recent work using causal tracing has shown that early Stable-Diffusion variants primarily localize knowledge in the first layer of the CLIP text-encoder, while it diffuses throughout the UNet. However, this approach fails to pinpoint localized knowledge in more recent models like SD-XL and DeepFloyd, highlighting challenges in model editing. To address this issue, the authors introduce the concept of *mechanistic localization*, which identifies a small fraction of layers in the UNet that control various visual attributes such as "style," "objects," and "facts." They propose LocoGen, a method that measures the direct effect of intermediate layers on output generation by performing interventions in the cross-attention layers of the UNet. LocoGen is then used with LocoEDIT, a closed-form editing method, to edit popular open-source text-to-image models, including SD-XL. The paper demonstrates that LocoGen can universally identify layers controlling visual attributes across different models, and LocoEDIT effectively edits these layers to remove styles, modify objects, and update facts. Notably, for certain attributes like "style," knowledge can be traced and edited to a subset of neurons, highlighting the potential for neuron-level model editing. The contributions of the paper include: - Highlighting the limitations of existing interpretability methods like causal tracing for recent text-to-image models. - Introducing LocoGen for universal knowledge localization. - Showcasing the effectiveness of LocoEDIT for model editing across various text-to-image models. The paper also includes empirical results and a human-study to validate the effectiveness of LocoGen and LocoEDIT, demonstrating their ability to accurately identify and edit specific visual attributes in text-to-image models.

On Mechanistic Knowledge Localization in Text-to-Image Generative Models

8 May 2024 | Samyadeep Basu * 1 Keivan Rezaei * 1 Priyatham Kattakinda 1 Ryan Rossi 2 Cherry Zhao 2 Vlad Morariu 2 Varun Manjunatha 2 Soheil Feizi 1