**DreamIdentity: Enhanced Editability for Efficient Face-Identity Preserved Image Generation**
DreamIdentity is a novel approach designed to enhance the editability of text-to-image (T2I) models while preserving the face identity. The primary challenge addressed is the ability to generate diverse and high-quality images that follow textual prompts while maintaining the identity of the input face. Traditional methods often struggle with both identity preservation and text coherence, leading to suboptimal results.
**Key Contributions:**
1. **Self-Augmented Editability Learning:** This technique involves using the T2I model itself to generate a dataset of celebrity faces and edited images, which is then used to train a dedicated encoder to improve editability.
2. **Multi-word Multi-scale ID Encoder ($M^2$ ID Encoder):** This encoder extracts multi-scale features from a ViT backbone and projects them into multiple word embeddings, ensuring accurate and edit-friendly representation of the face identity.
**Methodology:**
- **Training Pipeline:** The input face image is encoded into multi-word embeddings using the $M^2$ ID encoder. These embeddings are then combined with text prompts to generate images that align with the text and preserve the face identity.
- **Self-Augmented Editability Learning:** A self-augmented dataset is constructed by generating celebrity faces and edited images using the pre-trained T2I model. This dataset is used to train the $M^2$ ID encoder with an editability objective.
- **$M^2$ ID Encoder:** The encoder uses multi-scale features and multiple word embeddings to accurately represent the face identity, enhancing both identity preservation and editability.
**Experiments:**
- **Evaluation Metrics:** The method is evaluated on Text-alignment and Face-similarity, showing superior performance compared to existing methods in terms of editability, identity preservation, and encoding speed.
- **Ablation Studies:** Ablation studies demonstrate the effectiveness of the self-augmented editability learning and the $M^2$ ID encoder.
**Conclusion:**
DreamIdentity addresses the challenge of generating high-quality, identity-preserved images from a single facial image by leveraging self-augmented editability learning and a specialized ID encoder. Extensive experiments validate the effectiveness of the proposed method, making it a significant advancement in the field of text-to-image generation.**DreamIdentity: Enhanced Editability for Efficient Face-Identity Preserved Image Generation**
DreamIdentity is a novel approach designed to enhance the editability of text-to-image (T2I) models while preserving the face identity. The primary challenge addressed is the ability to generate diverse and high-quality images that follow textual prompts while maintaining the identity of the input face. Traditional methods often struggle with both identity preservation and text coherence, leading to suboptimal results.
**Key Contributions:**
1. **Self-Augmented Editability Learning:** This technique involves using the T2I model itself to generate a dataset of celebrity faces and edited images, which is then used to train a dedicated encoder to improve editability.
2. **Multi-word Multi-scale ID Encoder ($M^2$ ID Encoder):** This encoder extracts multi-scale features from a ViT backbone and projects them into multiple word embeddings, ensuring accurate and edit-friendly representation of the face identity.
**Methodology:**
- **Training Pipeline:** The input face image is encoded into multi-word embeddings using the $M^2$ ID encoder. These embeddings are then combined with text prompts to generate images that align with the text and preserve the face identity.
- **Self-Augmented Editability Learning:** A self-augmented dataset is constructed by generating celebrity faces and edited images using the pre-trained T2I model. This dataset is used to train the $M^2$ ID encoder with an editability objective.
- **$M^2$ ID Encoder:** The encoder uses multi-scale features and multiple word embeddings to accurately represent the face identity, enhancing both identity preservation and editability.
**Experiments:**
- **Evaluation Metrics:** The method is evaluated on Text-alignment and Face-similarity, showing superior performance compared to existing methods in terms of editability, identity preservation, and encoding speed.
- **Ablation Studies:** Ablation studies demonstrate the effectiveness of the self-augmented editability learning and the $M^2$ ID encoder.
**Conclusion:**
DreamIdentity addresses the challenge of generating high-quality, identity-preserved images from a single facial image by leveraging self-augmented editability learning and a specialized ID encoder. Extensive experiments validate the effectiveness of the proposed method, making it a significant advancement in the field of text-to-image generation.