Latent Guard: a Safety Framework for Text-to-image Generation

Latent Guard: a Safety Framework for Text-to-image Generation

18 Aug 2024 | Runtao Liu, Ashkan Khakzar, Jindong Gu, Qifeng Chen, Philip Torr, and Fabio Pizzati
Latent Guard is a safety framework for text-to-image (T2I) generation that improves safety measures by detecting harmful concepts in input text embeddings. Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder to identify blacklisted concepts. The framework uses a data generation pipeline with large language models, ad-hoc architectural components, and contrastive learning to benefit from generated data. It is evaluated on three datasets and against four baselines. Latent Guard is robust to many scenarios and can adapt blacklists at test time without retraining. It detects blacklisted concepts in a latent representation of the input text, allowing for detection beyond exact wording and resisting rephrasing and optimization techniques targeting textual encoders. The framework is efficient, with low computational costs and no need for further training after deployment. The framework uses an Embedding Mapping Layer to extract latent representations of blacklisted concepts and prompts, enabling the detection of unsafe content. It employs contrastive learning to train the model, minimizing the distance between embeddings of unsafe prompts and concepts while separating them from safe ones. During inference, Latent Guard analyzes distances in the learned latent space to detect blacklisted concepts in input prompts. Experiments show that Latent Guard outperforms baselines in detecting unsafe prompts, with high accuracy and AUC on various datasets. It demonstrates robust generalization across different distributions and is effective in detecting concepts even when they are not explicitly included in the blacklist. The framework is efficient, with low computational requirements and minimal impact on existing T2I pipelines. It allows for flexible testing scenarios and can be adapted at test time without retraining.Latent Guard is a safety framework for text-to-image (T2I) generation that improves safety measures by detecting harmful concepts in input text embeddings. Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder to identify blacklisted concepts. The framework uses a data generation pipeline with large language models, ad-hoc architectural components, and contrastive learning to benefit from generated data. It is evaluated on three datasets and against four baselines. Latent Guard is robust to many scenarios and can adapt blacklists at test time without retraining. It detects blacklisted concepts in a latent representation of the input text, allowing for detection beyond exact wording and resisting rephrasing and optimization techniques targeting textual encoders. The framework is efficient, with low computational costs and no need for further training after deployment. The framework uses an Embedding Mapping Layer to extract latent representations of blacklisted concepts and prompts, enabling the detection of unsafe content. It employs contrastive learning to train the model, minimizing the distance between embeddings of unsafe prompts and concepts while separating them from safe ones. During inference, Latent Guard analyzes distances in the learned latent space to detect blacklisted concepts in input prompts. Experiments show that Latent Guard outperforms baselines in detecting unsafe prompts, with high accuracy and AUC on various datasets. It demonstrates robust generalization across different distributions and is effective in detecting concepts even when they are not explicitly included in the blacklist. The framework is efficient, with low computational requirements and minimal impact on existing T2I pipelines. It allows for flexible testing scenarios and can be adapted at test time without retraining.
Reach us at info@study.space