2022-03-11 | Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang
Restormer is an efficient Transformer model designed for high-resolution image restoration tasks. It addresses the limitations of traditional Convolutional Neural Networks (CNNs) and Transformers by optimizing the computational complexity and long-range pixel interaction capabilities. The key contributions include:
1. **Multi-Dconv Head Transposed Attention (MDTA)**: This module applies self-attention across feature dimensions rather than spatial dimensions, allowing for linear complexity and efficient processing of high-resolution images. It emphasizes local context through depth-wise convolutions and computes cross-covariance to capture global context.
2. **Gated-Dconv Feed-Forward Network (GDFN)**: This network introduces a gating mechanism to control information flow, enhancing feature transformation and refining the output. It also includes depth-wise convolutions to encode spatial context.
3. **Progressive Learning**: The model is trained on smaller patches in early epochs and gradually transitions to larger patches in later epochs, improving the ability to learn global image statistics.
Restormer achieves state-of-the-art performance on various image restoration tasks, including deraining, motion deblurring, defocus deblurring, and denoising. Extensive experiments on 16 benchmark datasets demonstrate its effectiveness and efficiency. The source code and pre-trained models are available at <https://github.com/swz30/Restormer>.Restormer is an efficient Transformer model designed for high-resolution image restoration tasks. It addresses the limitations of traditional Convolutional Neural Networks (CNNs) and Transformers by optimizing the computational complexity and long-range pixel interaction capabilities. The key contributions include:
1. **Multi-Dconv Head Transposed Attention (MDTA)**: This module applies self-attention across feature dimensions rather than spatial dimensions, allowing for linear complexity and efficient processing of high-resolution images. It emphasizes local context through depth-wise convolutions and computes cross-covariance to capture global context.
2. **Gated-Dconv Feed-Forward Network (GDFN)**: This network introduces a gating mechanism to control information flow, enhancing feature transformation and refining the output. It also includes depth-wise convolutions to encode spatial context.
3. **Progressive Learning**: The model is trained on smaller patches in early epochs and gradually transitions to larger patches in later epochs, improving the ability to learn global image statistics.
Restormer achieves state-of-the-art performance on various image restoration tasks, including deraining, motion deblurring, defocus deblurring, and denoising. Extensive experiments on 16 benchmark datasets demonstrate its effectiveness and efficiency. The source code and pre-trained models are available at <https://github.com/swz30/Restormer>.