The paper addresses the challenge of cloud removal in optical remote sensing (RS) images, which significantly degrades image quality and effectiveness. Existing deep-learning (DL)-based cloud removal (CR) techniques often struggle to accurately reconstruct the original visual authenticity and detailed semantic content of images. To tackle this issue, the authors propose a novel diffusion-based framework called Diffusion Enhancement (DE) and a Weight Allocation (WA) network. The DE framework leverages the progressive texture detail recovery capabilities of diffusion models, while the WA network dynamically adjusts the weights for feature fusion, enhancing performance, especially in ultra-resolution image generation. Additionally, a coarse-to-fine training strategy is employed to expedite convergence and reduce computational complexity. The authors establish an ultra-resolution benchmark named CUHK Cloud Removal (CUHK-CR) with 0.5 m spatial resolution, incorporating rich detailed textures and diverse cloud coverage. Extensive experiments on the CUHK-CR and RICE datasets demonstrate that the proposed DE framework outperforms existing DL-based methods in terms of both perceptual quality and signal fidelity. The main contributions of the work include the introduction of the DE network, the development of the WA network, and the establishment of the CUHK-CR benchmark.The paper addresses the challenge of cloud removal in optical remote sensing (RS) images, which significantly degrades image quality and effectiveness. Existing deep-learning (DL)-based cloud removal (CR) techniques often struggle to accurately reconstruct the original visual authenticity and detailed semantic content of images. To tackle this issue, the authors propose a novel diffusion-based framework called Diffusion Enhancement (DE) and a Weight Allocation (WA) network. The DE framework leverages the progressive texture detail recovery capabilities of diffusion models, while the WA network dynamically adjusts the weights for feature fusion, enhancing performance, especially in ultra-resolution image generation. Additionally, a coarse-to-fine training strategy is employed to expedite convergence and reduce computational complexity. The authors establish an ultra-resolution benchmark named CUHK Cloud Removal (CUHK-CR) with 0.5 m spatial resolution, incorporating rich detailed textures and diverse cloud coverage. Extensive experiments on the CUHK-CR and RICE datasets demonstrate that the proposed DE framework outperforms existing DL-based methods in terms of both perceptual quality and signal fidelity. The main contributions of the work include the introduction of the DE network, the development of the WA network, and the establishment of the CUHK-CR benchmark.