Neural Video Compression with Feature Modulation

Neural Video Compression with Feature Modulation

29 Feb 2024 | Jiahao Li, Bin Li, Yan Lu
The paper "Neural Video Compression with Feature Modulation" by Jiahao Li, Bin Li, and Yan Lu from Microsoft Research Asia addresses the limitations of existing conditional coding-based neural video codecs (NVCs) and proposes a new codec, DCVC-FM, that overcomes these issues. The main challenges addressed are the support for a wide quality range and the ability to handle long prediction chains effectively. 1. **Wide Quality Range**: The authors introduce a learnable quantization scaler and a uniform quantization parameter sampling mechanism to modulate the latent features of the current frame. This allows the codec to support a wider quality range, achieving about 11.4 dB PSNR, compared to the average of 3.8 dB supported by previous NVCs. 2. **Long Prediction Chain**: To address the quality degradation problem in long prediction chains, the authors propose a periodically refreshing mechanism for the temporal features. This helps maintain quality across frames, even with a single intra-frame setting. 3. **Versatility and Efficiency**: The proposed codec supports both RGB and YUV colorspaces in a single model without additional fine-tuning. It also enables low-precision inference, reducing memory usage and computational complexity while maintaining negligible compression ratio degradation. 4. **Performance**: The experimental results show that DCVC-FM outperforms traditional codecs and previous SOTA NVCs in terms of compression ratio and quality. Specifically, it achieves a 29.7% bitrate saving over the previous SOTA NVC DCVC-DC with a 16% reduction in MACs. 5. **Ablation Study**: The paper includes an ablation study to demonstrate the effectiveness of each component of the proposed codec, showing that the features and mechanisms contribute significantly to the improved performance. 6. **Conclusion and Limitations**: The authors conclude that DCVC-FM represents a significant step forward in NVC technology, but note that further work is needed to address real-time speed and cross-platform issues for entropy coding. Overall, the paper provides a comprehensive solution to the critical problems of NVCs, making it a notable landmark in the evolution of neural video compression.The paper "Neural Video Compression with Feature Modulation" by Jiahao Li, Bin Li, and Yan Lu from Microsoft Research Asia addresses the limitations of existing conditional coding-based neural video codecs (NVCs) and proposes a new codec, DCVC-FM, that overcomes these issues. The main challenges addressed are the support for a wide quality range and the ability to handle long prediction chains effectively. 1. **Wide Quality Range**: The authors introduce a learnable quantization scaler and a uniform quantization parameter sampling mechanism to modulate the latent features of the current frame. This allows the codec to support a wider quality range, achieving about 11.4 dB PSNR, compared to the average of 3.8 dB supported by previous NVCs. 2. **Long Prediction Chain**: To address the quality degradation problem in long prediction chains, the authors propose a periodically refreshing mechanism for the temporal features. This helps maintain quality across frames, even with a single intra-frame setting. 3. **Versatility and Efficiency**: The proposed codec supports both RGB and YUV colorspaces in a single model without additional fine-tuning. It also enables low-precision inference, reducing memory usage and computational complexity while maintaining negligible compression ratio degradation. 4. **Performance**: The experimental results show that DCVC-FM outperforms traditional codecs and previous SOTA NVCs in terms of compression ratio and quality. Specifically, it achieves a 29.7% bitrate saving over the previous SOTA NVC DCVC-DC with a 16% reduction in MACs. 5. **Ablation Study**: The paper includes an ablation study to demonstrate the effectiveness of each component of the proposed codec, showing that the features and mechanisms contribute significantly to the improved performance. 6. **Conclusion and Limitations**: The authors conclude that DCVC-FM represents a significant step forward in NVC technology, but note that further work is needed to address real-time speed and cross-platform issues for entropy coding. Overall, the paper provides a comprehensive solution to the critical problems of NVCs, making it a notable landmark in the evolution of neural video compression.
Reach us at info@study.space
Understanding Neural Video Compression with Feature Modulation