PromptCIR: Blind Compressed Image Restoration with Prompt Learning

PromptCIR: Blind Compressed Image Restoration with Prompt Learning

26 Apr 2024 | Bingchen Li, Xin Li, Yiting Lu, Ruoyu Feng, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen
**PromptCIR: Blind Compressed Image Restoration with Prompt Learning** **Authors:** Bingchen Li, Xin Li, Yiting Lu, Ruoyu Feng, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen **Institution:** University of Science and Technology of China, Bytedance Inc. **Abstract:** Blind Compressed Image Restoration (CIR) aims to mitigate compression artifacts caused by unknown quality factors, particularly with JPEG codecs. Existing methods often rely on quality factor prediction networks, but these lack spatial information, limiting their adaptability. PromptCIR, a prompt-learning-based approach, encodes compression information implicitly through prompts, which interact with soft weights generated from image features. This dynamic content-aware and distortion-aware guidance enhances the restoration process. The method leverages a transformer-based backbone and a dynamic prompt module, achieving first place in the NTIRE 2024 challenge for blind compressed image enhancement. **Key Contributions:** 1. **Prompt Learning:** PromptCIR uses lightweight prompts to implicitly encode content-aware and distortion-aware information, providing dynamic guidance for restoration. 2. **Transformer-Based Backbone:** The method employs a powerful transformer backbone to handle complex image restoration tasks. 3. **Hybrid Attention Block:** RHAG (residual hybrid attention group) is used in the first two stages to enhance local and global information extraction, improving texture detail restoration. **Methodology:** - **Prompt Block:** Comprises a dynamic prompt generation module (DPM) and a prompt interaction module (PIM). DPM generates weights from image features, which interact with prompt bases to form final prompts. - **Hybrid Attention Block (RHAG):** Combines shifted window-based attention and convolution to enhance local and global information extraction. - **Overall Framework:** A U-shaped structure with a 4-stage encoder-decoder for hierarchical restoration, using prompts for content-aware and distortion-aware guidance. **Experiments:** - **Training Datasets:** DF2K and LSDIR (a large-scale high-quality dataset). - **Optimization Details:** Two-stage training strategy with a total batch size of 24 on 8 NVIDIA Tesla V100 GPUs. - **Evaluation:** Performance on blind and non-blind CIR benchmarks, showing superior results compared to existing methods. **Conclusion:** PromptCIR effectively addresses blind CIR by leveraging prompt learning and a powerful transformer backbone, achieving state-of-the-art performance in the NTIRE 2024 challenge. Extensive experiments validate its effectiveness in both blind and non-blind CIR tasks.**PromptCIR: Blind Compressed Image Restoration with Prompt Learning** **Authors:** Bingchen Li, Xin Li, Yiting Lu, Ruoyu Feng, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen **Institution:** University of Science and Technology of China, Bytedance Inc. **Abstract:** Blind Compressed Image Restoration (CIR) aims to mitigate compression artifacts caused by unknown quality factors, particularly with JPEG codecs. Existing methods often rely on quality factor prediction networks, but these lack spatial information, limiting their adaptability. PromptCIR, a prompt-learning-based approach, encodes compression information implicitly through prompts, which interact with soft weights generated from image features. This dynamic content-aware and distortion-aware guidance enhances the restoration process. The method leverages a transformer-based backbone and a dynamic prompt module, achieving first place in the NTIRE 2024 challenge for blind compressed image enhancement. **Key Contributions:** 1. **Prompt Learning:** PromptCIR uses lightweight prompts to implicitly encode content-aware and distortion-aware information, providing dynamic guidance for restoration. 2. **Transformer-Based Backbone:** The method employs a powerful transformer backbone to handle complex image restoration tasks. 3. **Hybrid Attention Block:** RHAG (residual hybrid attention group) is used in the first two stages to enhance local and global information extraction, improving texture detail restoration. **Methodology:** - **Prompt Block:** Comprises a dynamic prompt generation module (DPM) and a prompt interaction module (PIM). DPM generates weights from image features, which interact with prompt bases to form final prompts. - **Hybrid Attention Block (RHAG):** Combines shifted window-based attention and convolution to enhance local and global information extraction. - **Overall Framework:** A U-shaped structure with a 4-stage encoder-decoder for hierarchical restoration, using prompts for content-aware and distortion-aware guidance. **Experiments:** - **Training Datasets:** DF2K and LSDIR (a large-scale high-quality dataset). - **Optimization Details:** Two-stage training strategy with a total batch size of 24 on 8 NVIDIA Tesla V100 GPUs. - **Evaluation:** Performance on blind and non-blind CIR benchmarks, showing superior results compared to existing methods. **Conclusion:** PromptCIR effectively addresses blind CIR by leveraging prompt learning and a powerful transformer backbone, achieving state-of-the-art performance in the NTIRE 2024 challenge. Extensive experiments validate its effectiveness in both blind and non-blind CIR tasks.
Reach us at info@study.space
[slides] PromptCIR%3A Blind Compressed Image Restoration with Prompt Learning | StudySpace