This paper introduces DiLightNet, a novel method for fine-grained lighting control in text-driven diffusion-based image generation. Existing diffusion models can generate images under various lighting conditions, but they tend to correlate image content and lighting, and text prompts lack the expressive power to describe detailed lighting setups. DiLightNet addresses this by augmenting text prompts with detailed lighting information in the form of radiance hints, which are visualizations of the scene geometry with a homogeneous material under the target lighting. However, the scene geometry needed to produce the radiance hints is unknown. The key observation is that exact radiance hints are not necessary; instead, the diffusion model needs to be guided in the right direction. Based on this, a three-stage method is introduced for controlling lighting during image generation. In the first stage, a standard pretrained diffusion model generates a provisional image under uncontrolled lighting. In the second stage, the foreground object is resynthesized and refined using a refined diffusion model, DiLightNet, with radiance hints computed from a coarse shape of the foreground object inferred from the provisional image. To retain texture details, the radiance hints are multiplied with a neural encoding of the provisional image before being passed to DiLightNet. In the third stage, the background is resynthesized to be consistent with the lighting on the foreground object. The method is validated on various text prompts and lighting conditions, demonstrating its effectiveness in controlling lighting during image generation. The paper also discusses related work, including diffusion models for image generation, single image relighting, and other methods for controlling lighting in diffusion-based image generation. The results show that DiLightNet produces plausible images that match both the text prompt and the target lighting, and that the method is robust to various lighting conditions. The paper concludes that DiLightNet provides a novel approach for controlling lighting in diffusion-based image generation, with potential applications in estimating reflectance properties from a single photograph and text-to-3D generation with rich material properties.This paper introduces DiLightNet, a novel method for fine-grained lighting control in text-driven diffusion-based image generation. Existing diffusion models can generate images under various lighting conditions, but they tend to correlate image content and lighting, and text prompts lack the expressive power to describe detailed lighting setups. DiLightNet addresses this by augmenting text prompts with detailed lighting information in the form of radiance hints, which are visualizations of the scene geometry with a homogeneous material under the target lighting. However, the scene geometry needed to produce the radiance hints is unknown. The key observation is that exact radiance hints are not necessary; instead, the diffusion model needs to be guided in the right direction. Based on this, a three-stage method is introduced for controlling lighting during image generation. In the first stage, a standard pretrained diffusion model generates a provisional image under uncontrolled lighting. In the second stage, the foreground object is resynthesized and refined using a refined diffusion model, DiLightNet, with radiance hints computed from a coarse shape of the foreground object inferred from the provisional image. To retain texture details, the radiance hints are multiplied with a neural encoding of the provisional image before being passed to DiLightNet. In the third stage, the background is resynthesized to be consistent with the lighting on the foreground object. The method is validated on various text prompts and lighting conditions, demonstrating its effectiveness in controlling lighting during image generation. The paper also discusses related work, including diffusion models for image generation, single image relighting, and other methods for controlling lighting in diffusion-based image generation. The results show that DiLightNet produces plausible images that match both the text prompt and the target lighting, and that the method is robust to various lighting conditions. The paper concludes that DiLightNet provides a novel approach for controlling lighting in diffusion-based image generation, with potential applications in estimating reflectance properties from a single photograph and text-to-3D generation with rich material properties.