LCM-Lookahead for Encoder-based Text-to-Image Personalization

LCM-Lookahead for Encoder-based Text-to-Image Personalization

4 Apr 2024 | RINON GAL*, Tel Aviv University, NVIDIA, Israel OR LICHTER*, Tel Aviv University, Israel ELAD RICHARDSON*, Tel Aviv University, Israel OR PATASHNIK, Tel Aviv University, Israel AMIT H. BERMANO, Tel Aviv University, Israel GAL CHECHIK, NVIDIA, Israel DANIEL COHEN-OR, Tel Aviv University, Israel
The paper introduces a novel mechanism called LCM-Lookahead, which leverages fast-sampling methods to apply image-space losses to the training of encoder-based text-to-image personalization models. This mechanism uses latent consistency models (LCM) to create high-quality previews of denoised outputs, allowing for the calculation of image-space losses such as identity losses. The authors focus on improving identity fidelity and prompt alignment in personalization encoders, particularly for facial identities. They propose a lookahead identity loss and an extended self-attention mechanism to enhance the encoder's performance. Additionally, they generate a consistent dataset with repeated identities and varying styles to improve training. The method is evaluated through various experiments, demonstrating superior results in both qualitative and quantitative comparisons with prior and concurrent works. The paper also discusses limitations and ethical concerns, emphasizing the need for further improvements and responsible use.The paper introduces a novel mechanism called LCM-Lookahead, which leverages fast-sampling methods to apply image-space losses to the training of encoder-based text-to-image personalization models. This mechanism uses latent consistency models (LCM) to create high-quality previews of denoised outputs, allowing for the calculation of image-space losses such as identity losses. The authors focus on improving identity fidelity and prompt alignment in personalization encoders, particularly for facial identities. They propose a lookahead identity loss and an extended self-attention mechanism to enhance the encoder's performance. Additionally, they generate a consistent dataset with repeated identities and varying styles to improve training. The method is evaluated through various experiments, demonstrating superior results in both qualitative and quantitative comparisons with prior and concurrent works. The paper also discusses limitations and ethical concerns, emphasizing the need for further improvements and responsible use.
Reach us at info@study.space
[slides and audio] LCM-Lookahead for Encoder-based Text-to-Image Personalization