Latent Guard: a Safety Framework for Text-to-image Generation

Latent Guard: a Safety Framework for Text-to-image Generation

18 Aug 2024 | Runtao Liu, Ashkan Khakzar, Jindong Gu, Qifeng Chen, Philip Torr, and Fabio Pizzati
**Latent Guard: A Safety Framework for Text-to-Image Generation** **Runtao Liu, Ashkan Khakzar, Jindong Gu, Qifeng Chen, Philip Torr, and Fabio Pizzati** **Abstract:** Text-to-image (T2I) models can generate high-quality images but also pose risks of creating inappropriate content. Existing safety measures, such as text blacklists or harmful content classification, are either easily circumvented or require large datasets for training. Latent Guard is a framework designed to improve safety in T2I generation by learning a latent space on top of the text encoder to detect harmful concepts in input text embeddings. The framework uses a data generation pipeline, architectural components, and contrastive learning to train a model that can adapt blacklists at test time without retraining. Evaluations on three datasets and four baselines show that Latent Guard effectively blocks malicious input prompts, demonstrating its effectiveness and flexibility in detecting blacklisted concepts and adversarial attacks. **Introduction:** The rapid development of T2I models has transformed content creation, but it also introduces risks of generating unsafe content. Existing safety measures, such as text blacklists or harmful content classification, have limitations. Latent Guard addresses these issues by detecting blacklisted concepts in a latent representation of input text, allowing for more flexible and effective safety checks. **Related Work:** Text-to-image generation has evolved from GANs to diffusion models, with advancements in scaling and quality. Safety measures include blacklists, LLMs for harmful text recognition, and image classification. Latent Guard aims to improve upon these by detecting blacklisted concepts in a latent space. **The Latent Guard Framework:** Latent Guard learns a latent space using contrastive learning to map blacklisted concepts and prompts together. It uses a trainable architectural component to extract latent representations and a contrastive loss to enforce the mapping of unsafe prompts to similar embeddings in the latent space. **Experiments:** Latent Guard is evaluated on a dataset of unsafe and safe prompts, CoPro, and compared against four baselines. Results show that Latent Guard outperforms existing methods in detecting unsafe prompts, even in the presence of synonyms and adversarial attacks. The framework is efficient and can be integrated into existing T2I pipelines with minimal computational cost. **Conclusion:** Latent Guard is a novel safety framework for T2I models that improves upon existing methods by detecting blacklisted concepts in input prompts. It offers robust detection and good generalization across different datasets and concepts.**Latent Guard: A Safety Framework for Text-to-Image Generation** **Runtao Liu, Ashkan Khakzar, Jindong Gu, Qifeng Chen, Philip Torr, and Fabio Pizzati** **Abstract:** Text-to-image (T2I) models can generate high-quality images but also pose risks of creating inappropriate content. Existing safety measures, such as text blacklists or harmful content classification, are either easily circumvented or require large datasets for training. Latent Guard is a framework designed to improve safety in T2I generation by learning a latent space on top of the text encoder to detect harmful concepts in input text embeddings. The framework uses a data generation pipeline, architectural components, and contrastive learning to train a model that can adapt blacklists at test time without retraining. Evaluations on three datasets and four baselines show that Latent Guard effectively blocks malicious input prompts, demonstrating its effectiveness and flexibility in detecting blacklisted concepts and adversarial attacks. **Introduction:** The rapid development of T2I models has transformed content creation, but it also introduces risks of generating unsafe content. Existing safety measures, such as text blacklists or harmful content classification, have limitations. Latent Guard addresses these issues by detecting blacklisted concepts in a latent representation of input text, allowing for more flexible and effective safety checks. **Related Work:** Text-to-image generation has evolved from GANs to diffusion models, with advancements in scaling and quality. Safety measures include blacklists, LLMs for harmful text recognition, and image classification. Latent Guard aims to improve upon these by detecting blacklisted concepts in a latent space. **The Latent Guard Framework:** Latent Guard learns a latent space using contrastive learning to map blacklisted concepts and prompts together. It uses a trainable architectural component to extract latent representations and a contrastive loss to enforce the mapping of unsafe prompts to similar embeddings in the latent space. **Experiments:** Latent Guard is evaluated on a dataset of unsafe and safe prompts, CoPro, and compared against four baselines. Results show that Latent Guard outperforms existing methods in detecting unsafe prompts, even in the presence of synonyms and adversarial attacks. The framework is efficient and can be integrated into existing T2I pipelines with minimal computational cost. **Conclusion:** Latent Guard is a novel safety framework for T2I models that improves upon existing methods by detecting blacklisted concepts in input prompts. It offers robust detection and good generalization across different datasets and concepts.
Reach us at info@study.space