This paper introduces Deep Reward Tuning (DRTune), a method for training text-to-image diffusion models using differentiable reward functions. DRTune directly supervises the final output image and back-propagates through the iterative sampling process to the input noise. It addresses the depth-efficiency dilemma by stopping the gradient of the denoising network input and training a subset of sampling steps. DRTune is evaluated on various reward models and consistently outperforms other algorithms, particularly for low-level control signals. The method is applied to fine-tune the Stable Diffusion XL 1.0 (SDXL 1.0) model to optimize the Human Preference Score v2.1, resulting in the Favorable Diffusion XL 1.0 (FDXL 1.0) model. FDXL 1.0 significantly enhances image quality compared to SDXL 1.0 and reaches comparable quality with Midjourney v5.2. The contributions of this work include proposing DRTune for efficiently supervising early denoising steps and introducing FDXL 1.0, the state-of-the-art open-source text-to-image generative model tuned on human preferences. The paper also discusses related works, challenges in training diffusion models with reward functions, and the effectiveness of DRTune in improving image quality and convergence. The experiments show that DRTune outperforms other reward-training methods in terms of image quality and convergence speed. The paper concludes that DRTune is an effective method for training text-to-image diffusion models using reward functions and highlights the potential of reward training in improving image generation quality.This paper introduces Deep Reward Tuning (DRTune), a method for training text-to-image diffusion models using differentiable reward functions. DRTune directly supervises the final output image and back-propagates through the iterative sampling process to the input noise. It addresses the depth-efficiency dilemma by stopping the gradient of the denoising network input and training a subset of sampling steps. DRTune is evaluated on various reward models and consistently outperforms other algorithms, particularly for low-level control signals. The method is applied to fine-tune the Stable Diffusion XL 1.0 (SDXL 1.0) model to optimize the Human Preference Score v2.1, resulting in the Favorable Diffusion XL 1.0 (FDXL 1.0) model. FDXL 1.0 significantly enhances image quality compared to SDXL 1.0 and reaches comparable quality with Midjourney v5.2. The contributions of this work include proposing DRTune for efficiently supervising early denoising steps and introducing FDXL 1.0, the state-of-the-art open-source text-to-image generative model tuned on human preferences. The paper also discusses related works, challenges in training diffusion models with reward functions, and the effectiveness of DRTune in improving image quality and convergence. The experiments show that DRTune outperforms other reward-training methods in terms of image quality and convergence speed. The paper concludes that DRTune is an effective method for training text-to-image diffusion models using reward functions and highlights the potential of reward training in improving image generation quality.