The paper introduces CRS-Diff, a novel controllable generative foundation model for remote sensing (RS) image generation. It leverages diffusion models and advanced control mechanisms to support multiple conditioning inputs, including text, metadata, and image conditions, enhancing the precision and stability of image generation. CRS-Diff integrates a conditional control mechanism for multi-scale feature fusion, improving the effectiveness of control conditions. Experimental results demonstrate that CRS-Diff outperforms existing methods in generating high-quality RS images under both single-condition and multi-condition scenarios, both quantitatively and qualitatively. Additionally, CRS-Diff can serve as a data engine for generating high-quality training data for downstream tasks, such as road extraction. The code for CRS-Diff is available on GitHub.The paper introduces CRS-Diff, a novel controllable generative foundation model for remote sensing (RS) image generation. It leverages diffusion models and advanced control mechanisms to support multiple conditioning inputs, including text, metadata, and image conditions, enhancing the precision and stability of image generation. CRS-Diff integrates a conditional control mechanism for multi-scale feature fusion, improving the effectiveness of control conditions. Experimental results demonstrate that CRS-Diff outperforms existing methods in generating high-quality RS images under both single-condition and multi-condition scenarios, both quantitatively and qualitatively. Additionally, CRS-Diff can serve as a data engine for generating high-quality training data for downstream tasks, such as road extraction. The code for CRS-Diff is available on GitHub.