Boosting Neural Representations for Videos with a Conditional Decoder

Boosting Neural Representations for Videos with a Conditional Decoder

16 Mar 2024 | Xinjie Zhang, Ren Yang, Dailan He, Xingtong Ge, Tongda Xu, Yan Wang, Hongwei Qin, Jun Zhang
This paper introduces a universal boosting framework for implicit neural representations (INRs) in video processing, aiming to enhance their representation capabilities and improve performance in various tasks. The framework leverages a conditional decoder with a temporal-aware affine transform (TAT) module, which uses the frame index as a prior condition to align intermediate features with target frames more effectively. Additionally, the authors introduce a sinusoidal NeRV-like (SNeRV) block to generate diverse intermediate features and achieve a more balanced parameter distribution, thereby enhancing the model's capacity. The approach also incorporates a high-frequency information-preserving reconstruction loss to preserve intricate details in the reconstructed videos. Furthermore, a consistent entropy minimization (CEM) technique is developed to ensure consistency between training and inference, improving the coding efficiency of video codecs. Experimental results on the UVG dataset demonstrate that the boosted INRs outperform baseline methods in video regression, compression, inpainting, and interpolation tasks, offering superior rate-distortion performance compared to traditional and learning-based codecs.This paper introduces a universal boosting framework for implicit neural representations (INRs) in video processing, aiming to enhance their representation capabilities and improve performance in various tasks. The framework leverages a conditional decoder with a temporal-aware affine transform (TAT) module, which uses the frame index as a prior condition to align intermediate features with target frames more effectively. Additionally, the authors introduce a sinusoidal NeRV-like (SNeRV) block to generate diverse intermediate features and achieve a more balanced parameter distribution, thereby enhancing the model's capacity. The approach also incorporates a high-frequency information-preserving reconstruction loss to preserve intricate details in the reconstructed videos. Furthermore, a consistent entropy minimization (CEM) technique is developed to ensure consistency between training and inference, improving the coding efficiency of video codecs. Experimental results on the UVG dataset demonstrate that the boosted INRs outperform baseline methods in video regression, compression, inpainting, and interpolation tasks, offering superior rate-distortion performance compared to traditional and learning-based codecs.
Reach us at info@study.space