EDVR: Video Restoration with Enhanced Deformable Convolutional Networks

EDVR: Video Restoration with Enhanced Deformable Convolutional Networks

7 May 2019 | Xintao Wang, Kelvin C.K. Chan, Ke Yu, Chao Dong, Chen Change Loy
The paper introduces EDVR (Enhanced Deformable Convolutional Networks), a novel framework for video restoration tasks such as super-resolution and deblurring. The framework addresses the challenges posed by large motions and diverse blurs in videos, which are common in real-world scenarios. EDVR consists of two main components: a Pyramid, Cascading and Deformable (PCD) alignment module and a Temporal and Spatial Attention (TSA) fusion module. 1. **PCD Alignment Module**: This module uses deformable convolutions to align frames at the feature level in a coarse-to-fine manner. It employs a pyramid structure to handle large and complex motions, refining the alignment step by step. The cascading of deformable convolutions further improves the robustness of the alignment process. 2. **TSA Fusion Module**: This module applies temporal and spatial attention to fuse the aligned features effectively. Temporal attention computes the similarity between frames, while spatial attention modulates the features based on their informativeness. This ensures that only the most informative frames are used for reconstruction. The paper demonstrates the effectiveness of EDVR through extensive experiments on the NTIRE19 benchmark, where it outperforms state-of-the-art methods in all four tracks of the video restoration and enhancement challenges. Additionally, EDVR shows superior performance on existing benchmarks for video super-resolution and deblurring tasks. The code for EDVR is available at <https://github.com/xintao/EDVR>.The paper introduces EDVR (Enhanced Deformable Convolutional Networks), a novel framework for video restoration tasks such as super-resolution and deblurring. The framework addresses the challenges posed by large motions and diverse blurs in videos, which are common in real-world scenarios. EDVR consists of two main components: a Pyramid, Cascading and Deformable (PCD) alignment module and a Temporal and Spatial Attention (TSA) fusion module. 1. **PCD Alignment Module**: This module uses deformable convolutions to align frames at the feature level in a coarse-to-fine manner. It employs a pyramid structure to handle large and complex motions, refining the alignment step by step. The cascading of deformable convolutions further improves the robustness of the alignment process. 2. **TSA Fusion Module**: This module applies temporal and spatial attention to fuse the aligned features effectively. Temporal attention computes the similarity between frames, while spatial attention modulates the features based on their informativeness. This ensures that only the most informative frames are used for reconstruction. The paper demonstrates the effectiveness of EDVR through extensive experiments on the NTIRE19 benchmark, where it outperforms state-of-the-art methods in all four tracks of the video restoration and enhancement challenges. Additionally, EDVR shows superior performance on existing benchmarks for video super-resolution and deblurring tasks. The code for EDVR is available at <https://github.com/xintao/EDVR>.
Reach us at info@study.space