[slides and audio] DreamBooth%3A Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

**DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation** **Authors:** Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman **Affiliation:** Google Research, Boston University **Abstract:** Large text-to-image models have achieved remarkable advancements in generating high-quality and diverse images from text prompts. However, these models lack the ability to mimic specific subjects in reference images and synthesize novel renditions in different contexts. This paper introduces DreamBooth, a method for personalizing text-to-image diffusion models. Given a few images of a subject, DreamBooth fine-tunes a pre-trained model to bind a unique identifier with the subject. This allows the model to generate novel photorealistic images of the subject in various scenes, poses, and lighting conditions, while maintaining high fidelity to the subject's key visual features. The technique leverages a new autogenous class-specific prior preservation loss to prevent language drift and encourage diverse instance generation. The method is applied to tasks such as subject recontextualization, text-guided view synthesis, and artistic rendering, preserving the subject's key features. A new dataset and evaluation protocol are also introduced to assess subject and prompt fidelity. **Introduction:** DreamBooth addresses the challenge of generating novel images of a subject in different contexts while preserving its distinctive features. Traditional methods struggle with diverse datasets and novel scenes, whereas DreamBooth enables generation of subjects in new poses and contexts. The approach involves fine-tuning a text-to-image diffusion model using a few images of the subject and unique identifiers. A class-specific prior preservation loss is proposed to prevent language drift and encourage diverse instance generation. Experiments demonstrate the effectiveness of DreamBooth in various applications, including recontextualization, property modification, and artistic rendering, with high subject and prompt fidelity. **Methods:** DreamBooth fine-tunes a pre-trained text-to-image diffusion model using a few images of the subject and unique identifiers. A class-specific prior preservation loss is introduced to prevent language drift and encourage diverse instance generation. The method is evaluated using a new dataset and evaluation protocol, showing superior performance in subject and prompt fidelity compared to existing methods. **Experiments:** DreamBooth is evaluated on a dataset of 30 subjects, including objects and live subjects/pets. The method is applied to tasks such as recontextualization, property modification, and artistic rendering, demonstrating high subject and prompt fidelity. A user study confirms the effectiveness of DreamBooth in preserving the identity and essence of the subject. **Limitations:** DreamBooth has limitations, including difficulty in generating accurate contexts for some subjects and potential overfitting to the training set. Some subjects are easier to learn than others, and the fidelity of generated images can vary. **Conclusion:** DreamBooth provides a novel approach to generating novel renditions of a subject**DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation** **Authors:** Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman **Affiliation:** Google Research, Boston University **Abstract:** Large text-to-image models have achieved remarkable advancements in generating high-quality and diverse images from text prompts. However, these models lack the ability to mimic specific subjects in reference images and synthesize novel renditions in different contexts. This paper introduces DreamBooth, a method for personalizing text-to-image diffusion models. Given a few images of a subject, DreamBooth fine-tunes a pre-trained model to bind a unique identifier with the subject. This allows the model to generate novel photorealistic images of the subject in various scenes, poses, and lighting conditions, while maintaining high fidelity to the subject's key visual features. The technique leverages a new autogenous class-specific prior preservation loss to prevent language drift and encourage diverse instance generation. The method is applied to tasks such as subject recontextualization, text-guided view synthesis, and artistic rendering, preserving the subject's key features. A new dataset and evaluation protocol are also introduced to assess subject and prompt fidelity. **Introduction:** DreamBooth addresses the challenge of generating novel images of a subject in different contexts while preserving its distinctive features. Traditional methods struggle with diverse datasets and novel scenes, whereas DreamBooth enables generation of subjects in new poses and contexts. The approach involves fine-tuning a text-to-image diffusion model using a few images of the subject and unique identifiers. A class-specific prior preservation loss is proposed to prevent language drift and encourage diverse instance generation. Experiments demonstrate the effectiveness of DreamBooth in various applications, including recontextualization, property modification, and artistic rendering, with high subject and prompt fidelity. **Methods:** DreamBooth fine-tunes a pre-trained text-to-image diffusion model using a few images of the subject and unique identifiers. A class-specific prior preservation loss is introduced to prevent language drift and encourage diverse instance generation. The method is evaluated using a new dataset and evaluation protocol, showing superior performance in subject and prompt fidelity compared to existing methods. **Experiments:** DreamBooth is evaluated on a dataset of 30 subjects, including objects and live subjects/pets. The method is applied to tasks such as recontextualization, property modification, and artistic rendering, demonstrating high subject and prompt fidelity. A user study confirms the effectiveness of DreamBooth in preserving the identity and essence of the subject. **Limitations:** DreamBooth has limitations, including difficulty in generating accurate contexts for some subjects and potential overfitting to the training set. Some subjects are easier to learn than others, and the fidelity of generated images can vary. **Conclusion:** DreamBooth provides a novel approach to generating novel renditions of a subject

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

15 Mar 2023 | Nataniel Ruiz*,1,2 Yael Pritch1 Yuanzhen Li1 Michael Rubinstein1 Varun Jampani1 Kfir Aberman1