XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution

XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution

19 Jul 2024 | Yunpeng Qu, Kun Yuan, Kai Zhao, Qizhi Xie, Jinhua Hao, Ming Sun and Chao Zhou
XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution This paper proposes a framework called XPSR for image super-resolution (ISR) that leverages cross-modal priors from multimodal large language models (MLLMs) to enhance the performance of diffusion-based models. The main contributions of XPSR include the use of high-level and low-level semantic priors derived from MLLMs to guide the diffusion model in generating high-fidelity and realistic images. The framework also introduces a Semantic-Fusion Attention (SFA) module to effectively fuse cross-modal semantic priors with the diffusion model, and a Degradation-Free Constraint (DFC) to extract semantic-preserved information from low-resolution (LR) images. The results show that XPSR achieves strong performance in generating high-fidelity and high-realism images across synthetic and real-world datasets. The framework is evaluated on various benchmarks and demonstrates superior performance compared to state-of-the-art methods. The key components of XPSR include the use of MLLMs to extract semantic priors, the SFA module for semantic fusion, and the DFC for degradation-free constraint. The experiments show that XPSR outperforms other methods in terms of image quality metrics and produces more realistic and detailed images. The framework is implemented using a pre-trained Stable Diffusion model and ControlNet, with the SFA and DFC modules designed to enhance the performance of the diffusion model. The results demonstrate that XPSR is effective in restoring semantic details and achieving high-quality image restoration.XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution This paper proposes a framework called XPSR for image super-resolution (ISR) that leverages cross-modal priors from multimodal large language models (MLLMs) to enhance the performance of diffusion-based models. The main contributions of XPSR include the use of high-level and low-level semantic priors derived from MLLMs to guide the diffusion model in generating high-fidelity and realistic images. The framework also introduces a Semantic-Fusion Attention (SFA) module to effectively fuse cross-modal semantic priors with the diffusion model, and a Degradation-Free Constraint (DFC) to extract semantic-preserved information from low-resolution (LR) images. The results show that XPSR achieves strong performance in generating high-fidelity and high-realism images across synthetic and real-world datasets. The framework is evaluated on various benchmarks and demonstrates superior performance compared to state-of-the-art methods. The key components of XPSR include the use of MLLMs to extract semantic priors, the SFA module for semantic fusion, and the DFC for degradation-free constraint. The experiments show that XPSR outperforms other methods in terms of image quality metrics and produces more realistic and detailed images. The framework is implemented using a pre-trained Stable Diffusion model and ControlNet, with the SFA and DFC modules designed to enhance the performance of the diffusion model. The results demonstrate that XPSR is effective in restoring semantic details and achieving high-quality image restoration.
Reach us at info@study.space