[slides and audio] ReNO%3A Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

**ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization** **Authors:** Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, Zeynep Akata **Abstract:** Text-to-Image (T2I) models have made significant advancements but still struggle with accurately capturing intricate details in complex compositional prompts. While fine-tuning T2I models with reward objectives has shown promise, it suffers from "reward hacking" and may not generalize well to unseen prompt distributions. This work introduces Reward-based Noise Optimization (ReNO), a novel approach that enhances T2I models at inference by optimizing the initial noise based on human preference reward models. ReNO significantly improves performance on four different one-step models across two benchmarks, T2I-CompBench and GenEval, with impressive results. Within a computational budget of 20-50 seconds, ReNO-enhanced models consistently surpass the performance of all current open-source T2I models. Extensive user studies demonstrate that ReNO is preferred nearly twice as often compared to the popular SDXL model and is on par with the proprietary Stable Diffusion 3 with 8B parameters. ReNO also outperforms widely-used open-source models such as SDXL and PixArt-α, highlighting its efficiency and effectiveness in enhancing T2I model performance at inference time. **Key Contributions:** 1. **ReNO Approach:** ReNO optimizes the initial noise based on human preference reward models, improving the quality and faithfulness of generated images. 2. **Performance Improvements:** ReNO significantly enhances the performance of four different one-step T2I models on T2I-CompBench and GenEval benchmarks. 3. **User Studies:** ReNO is preferred nearly twice as often compared to the popular SDXL model and is on par with the proprietary Stable Diffusion 3 with 8B parameters. 4. **Efficiency:** ReNO achieves image generation, including noise optimization, in 20-50 seconds, making it suitable for practical applications. **Related Work:** - **Initial Noise Optimization:** Previous methods focus on controlling generated samples for specific applications, while ReNO is designed to generally improve T2I models without additional techniques to mitigate exploding or vanishing gradients. - **Reward Optimization:** While previous works explore incorporating reward models to enhance T2I generation, ReNO focuses on adapting a diffusion model during inference by optimizing the initial latent noise using a differentiable objective. **Experiments:** - **Setup:** ReNO is evaluated using four open-source one-step image generation models: SD-Turbo, SDXL-Turbo, PixArt-$\alpha$ DMD, and HyperSDXL. - **Results:** ReNO achieves strong improvements in attribute binding, object relationships, and complex compositions, outperforming other models in both quantitative and qualitative evaluations.**ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization** **Authors:** Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, Zeynep Akata **Abstract:** Text-to-Image (T2I) models have made significant advancements but still struggle with accurately capturing intricate details in complex compositional prompts. While fine-tuning T2I models with reward objectives has shown promise, it suffers from "reward hacking" and may not generalize well to unseen prompt distributions. This work introduces Reward-based Noise Optimization (ReNO), a novel approach that enhances T2I models at inference by optimizing the initial noise based on human preference reward models. ReNO significantly improves performance on four different one-step models across two benchmarks, T2I-CompBench and GenEval, with impressive results. Within a computational budget of 20-50 seconds, ReNO-enhanced models consistently surpass the performance of all current open-source T2I models. Extensive user studies demonstrate that ReNO is preferred nearly twice as often compared to the popular SDXL model and is on par with the proprietary Stable Diffusion 3 with 8B parameters. ReNO also outperforms widely-used open-source models such as SDXL and PixArt-α, highlighting its efficiency and effectiveness in enhancing T2I model performance at inference time. **Key Contributions:** 1. **ReNO Approach:** ReNO optimizes the initial noise based on human preference reward models, improving the quality and faithfulness of generated images. 2. **Performance Improvements:** ReNO significantly enhances the performance of four different one-step T2I models on T2I-CompBench and GenEval benchmarks. 3. **User Studies:** ReNO is preferred nearly twice as often compared to the popular SDXL model and is on par with the proprietary Stable Diffusion 3 with 8B parameters. 4. **Efficiency:** ReNO achieves image generation, including noise optimization, in 20-50 seconds, making it suitable for practical applications. **Related Work:** - **Initial Noise Optimization:** Previous methods focus on controlling generated samples for specific applications, while ReNO is designed to generally improve T2I models without additional techniques to mitigate exploding or vanishing gradients. - **Reward Optimization:** While previous works explore incorporating reward models to enhance T2I generation, ReNO focuses on adapting a diffusion model during inference by optimizing the initial latent noise using a differentiable objective. **Experiments:** - **Setup:** ReNO is evaluated using four open-source one-step image generation models: SD-Turbo, SDXL-Turbo, PixArt-$\alpha$ DMD, and HyperSDXL. - **Results:** ReNO achieves strong improvements in attribute binding, object relationships, and complex compositions, outperforming other models in both quantitative and qualitative evaluations.

ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

6 Jun 2024 | Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, Zeynep Akata