15 Mar 2023 | Nataniel Ruiz*,1,2 Yael Pritch1 Yuanzhen Li1 Michael Rubinstein1 Varun Jampani1 Kfir Aberman1
DreamBooth is a method for fine-tuning text-to-image diffusion models to generate images of specific subjects in various contexts. The approach involves using a few images of a subject to train the model so that it can generate new, photorealistic images of the subject with high fidelity. The key idea is to bind a unique identifier with the subject, allowing the model to generate images that maintain the subject's key visual features while adapting to different scenes, poses, and lighting conditions. The method introduces a class-specific prior preservation loss to prevent language drift and ensure diverse outputs. DreamBooth is applied to various tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering. The method is evaluated using a new dataset and protocol, demonstrating superior performance compared to alternative approaches. The technique enables users to generate novel images of a subject from just a few input images, making it a powerful tool for subject-driven generation. The approach is effective in preserving the subject's identity and details while generating images in new contexts. The method is also compared with Textual Inversion, showing that DreamBooth achieves higher subject and prompt fidelity. The results highlight the effectiveness of the approach in generating diverse and realistic images of subjects.DreamBooth is a method for fine-tuning text-to-image diffusion models to generate images of specific subjects in various contexts. The approach involves using a few images of a subject to train the model so that it can generate new, photorealistic images of the subject with high fidelity. The key idea is to bind a unique identifier with the subject, allowing the model to generate images that maintain the subject's key visual features while adapting to different scenes, poses, and lighting conditions. The method introduces a class-specific prior preservation loss to prevent language drift and ensure diverse outputs. DreamBooth is applied to various tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering. The method is evaluated using a new dataset and protocol, demonstrating superior performance compared to alternative approaches. The technique enables users to generate novel images of a subject from just a few input images, making it a powerful tool for subject-driven generation. The approach is effective in preserving the subject's identity and details while generating images in new contexts. The method is also compared with Textual Inversion, showing that DreamBooth achieves higher subject and prompt fidelity. The results highlight the effectiveness of the approach in generating diverse and realistic images of subjects.