**ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization**
**Authors:** Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, Zeynep Akata
**Abstract:**
Text-to-Image (T2I) models have made significant advancements but still struggle with accurately capturing intricate details in complex compositional prompts. While fine-tuning T2I models with reward objectives has shown promise, it suffers from "reward hacking" and may not generalize well to unseen prompt distributions. This work introduces Reward-based Noise Optimization (ReNO), a novel approach that enhances T2I models at inference by optimizing the initial noise based on human preference reward models. ReNO significantly improves performance on four different one-step models across two benchmarks, T2I-CompBench and GenEval, with impressive results. Within a computational budget of 20-50 seconds, ReNO-enhanced models consistently surpass the performance of all current open-source T2I models. Extensive user studies demonstrate that ReNO is preferred nearly twice as often compared to the popular SDXL model and is on par with the proprietary Stable Diffusion 3 with 8B parameters. ReNO also outperforms widely-used open-source models such as SDXL and PixArt-α, highlighting its efficiency and effectiveness in enhancing T2I model performance at inference time.
**Key Contributions:**
1. **ReNO Approach:** ReNO optimizes the initial noise based on human preference reward models, improving the quality and faithfulness of generated images.
2. **Performance Improvements:** ReNO significantly enhances the performance of four different one-step T2I models on T2I-CompBench and GenEval benchmarks.
3. **User Studies:** ReNO is preferred nearly twice as often compared to the popular SDXL model and is on par with the proprietary Stable Diffusion 3 with 8B parameters.
4. **Efficiency:** ReNO achieves image generation, including noise optimization, in 20-50 seconds, making it suitable for practical applications.
**Related Work:**
- **Initial Noise Optimization:** Previous methods focus on controlling generated samples for specific applications, while ReNO is designed to generally improve T2I models without additional techniques to mitigate exploding or vanishing gradients.
- **Reward Optimization:** While previous works explore incorporating reward models to enhance T2I generation, ReNO focuses on adapting a diffusion model during inference by optimizing the initial latent noise using a differentiable objective.
**Experiments:**
- **Setup:** ReNO is evaluated using four open-source one-step image generation models: SD-Turbo, SDXL-Turbo, PixArt-$\alpha$ DMD, and HyperSDXL.
- **Results:** ReNO achieves strong improvements in attribute binding, object relationships, and complex compositions, outperforming other models in both quantitative and qualitative evaluations.**ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization**
**Authors:** Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, Zeynep Akata
**Abstract:**
Text-to-Image (T2I) models have made significant advancements but still struggle with accurately capturing intricate details in complex compositional prompts. While fine-tuning T2I models with reward objectives has shown promise, it suffers from "reward hacking" and may not generalize well to unseen prompt distributions. This work introduces Reward-based Noise Optimization (ReNO), a novel approach that enhances T2I models at inference by optimizing the initial noise based on human preference reward models. ReNO significantly improves performance on four different one-step models across two benchmarks, T2I-CompBench and GenEval, with impressive results. Within a computational budget of 20-50 seconds, ReNO-enhanced models consistently surpass the performance of all current open-source T2I models. Extensive user studies demonstrate that ReNO is preferred nearly twice as often compared to the popular SDXL model and is on par with the proprietary Stable Diffusion 3 with 8B parameters. ReNO also outperforms widely-used open-source models such as SDXL and PixArt-α, highlighting its efficiency and effectiveness in enhancing T2I model performance at inference time.
**Key Contributions:**
1. **ReNO Approach:** ReNO optimizes the initial noise based on human preference reward models, improving the quality and faithfulness of generated images.
2. **Performance Improvements:** ReNO significantly enhances the performance of four different one-step T2I models on T2I-CompBench and GenEval benchmarks.
3. **User Studies:** ReNO is preferred nearly twice as often compared to the popular SDXL model and is on par with the proprietary Stable Diffusion 3 with 8B parameters.
4. **Efficiency:** ReNO achieves image generation, including noise optimization, in 20-50 seconds, making it suitable for practical applications.
**Related Work:**
- **Initial Noise Optimization:** Previous methods focus on controlling generated samples for specific applications, while ReNO is designed to generally improve T2I models without additional techniques to mitigate exploding or vanishing gradients.
- **Reward Optimization:** While previous works explore incorporating reward models to enhance T2I generation, ReNO focuses on adapting a diffusion model during inference by optimizing the initial latent noise using a differentiable objective.
**Experiments:**
- **Setup:** ReNO is evaluated using four open-source one-step image generation models: SD-Turbo, SDXL-Turbo, PixArt-$\alpha$ DMD, and HyperSDXL.
- **Results:** ReNO achieves strong improvements in attribute binding, object relationships, and complex compositions, outperforming other models in both quantitative and qualitative evaluations.