Understanding DreamIdentity%3A Enhanced Editability for Efficient Face-Identity Preserved Image Generation

**DreamIdentity: Enhanced Editability for Efficient Face-Identity Preserved Image Generation** DreamIdentity is a novel approach designed to enhance the editability of text-to-image (T2I) models while preserving the face identity. The primary challenge addressed is the ability to generate diverse and high-quality images that follow textual prompts while maintaining the identity of the input face. Traditional methods often struggle with both identity preservation and text coherence, leading to suboptimal results. **Key Contributions:** 1. **Self-Augmented Editability Learning:** This technique involves using the T2I model itself to generate a dataset of celebrity faces and edited images, which is then used to train a dedicated encoder to improve editability. 2. **Multi-word Multi-scale ID Encoder ($M^2$ ID Encoder):** This encoder extracts multi-scale features from a ViT backbone and projects them into multiple word embeddings, ensuring accurate and edit-friendly representation of the face identity. **Methodology:** - **Training Pipeline:** The input face image is encoded into multi-word embeddings using the $M^2$ ID encoder. These embeddings are then combined with text prompts to generate images that align with the text and preserve the face identity. - **Self-Augmented Editability Learning:** A self-augmented dataset is constructed by generating celebrity faces and edited images using the pre-trained T2I model. This dataset is used to train the $M^2$ ID encoder with an editability objective. - **$M^2$ ID Encoder:** The encoder uses multi-scale features and multiple word embeddings to accurately represent the face identity, enhancing both identity preservation and editability. **Experiments:** - **Evaluation Metrics:** The method is evaluated on Text-alignment and Face-similarity, showing superior performance compared to existing methods in terms of editability, identity preservation, and encoding speed. - **Ablation Studies:** Ablation studies demonstrate the effectiveness of the self-augmented editability learning and the $M^2$ ID encoder. **Conclusion:** DreamIdentity addresses the challenge of generating high-quality, identity-preserved images from a single facial image by leveraging self-augmented editability learning and a specialized ID encoder. Extensive experiments validate the effectiveness of the proposed method, making it a significant advancement in the field of text-to-image generation.**DreamIdentity: Enhanced Editability for Efficient Face-Identity Preserved Image Generation** DreamIdentity is a novel approach designed to enhance the editability of text-to-image (T2I) models while preserving the face identity. The primary challenge addressed is the ability to generate diverse and high-quality images that follow textual prompts while maintaining the identity of the input face. Traditional methods often struggle with both identity preservation and text coherence, leading to suboptimal results. **Key Contributions:** 1. **Self-Augmented Editability Learning:** This technique involves using the T2I model itself to generate a dataset of celebrity faces and edited images, which is then used to train a dedicated encoder to improve editability. 2. **Multi-word Multi-scale ID Encoder ($M^2$ ID Encoder):** This encoder extracts multi-scale features from a ViT backbone and projects them into multiple word embeddings, ensuring accurate and edit-friendly representation of the face identity. **Methodology:** - **Training Pipeline:** The input face image is encoded into multi-word embeddings using the $M^2$ ID encoder. These embeddings are then combined with text prompts to generate images that align with the text and preserve the face identity. - **Self-Augmented Editability Learning:** A self-augmented dataset is constructed by generating celebrity faces and edited images using the pre-trained T2I model. This dataset is used to train the $M^2$ ID encoder with an editability objective. - **$M^2$ ID Encoder:** The encoder uses multi-scale features and multiple word embeddings to accurately represent the face identity, enhancing both identity preservation and editability. **Experiments:** - **Evaluation Metrics:** The method is evaluated on Text-alignment and Face-similarity, showing superior performance compared to existing methods in terms of editability, identity preservation, and encoding speed. - **Ablation Studies:** Ablation studies demonstrate the effectiveness of the self-augmented editability learning and the $M^2$ ID encoder. **Conclusion:** DreamIdentity addresses the challenge of generating high-quality, identity-preserved images from a single facial image by leveraging self-augmented editability learning and a specialized ID encoder. Extensive experiments validate the effectiveness of the proposed method, making it a significant advancement in the field of text-to-image generation.

DreamIdentity: Enhanced Editability for Efficient Face-Identity Preserved Image Generation

2024 | Zhuowei Chen, Shancheng Fang, Wei Liu, Qian He, Mengqi Huang, Zhendong Mao