Variational Image Compression with a Scale Hyperprior

Variational Image Compression with a Scale Hyperprior

1 May 2018 | Johannes Ballé*, David Minnen*, Saurabh Singh*, Sung Jin Hwang*, Nick Johnston*
This paper presents a variational image compression model that incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. The model is trained end-to-end and jointly with the underlying autoencoder, allowing it to learn a complex prior distribution for the latent representation. The hyperprior is introduced as a prior on the parameters of the entropy model, making them hyperpriors of the latent representation. This approach enables the model to better capture the statistical dependencies in the latent space, leading to state-of-the-art image compression performance when measured using the MS-SSIM quality index. Additionally, the model achieves superior rate–distortion performance compared to other ANN-based methods when evaluated using PSNR, a metric based on mean squared error. The model is based on variational autoencoders (VAEs), which are probabilistic generative models augmented with approximate inference models. The synthesis and analysis transforms are linked to the generative and inference models, respectively. The goal of variational inference is to approximate the true posterior distribution with a parametric variational density by minimizing the Kullback–Leibler (KL) divergence over the data distribution. This minimization is equivalent to optimizing the compression model for rate–distortion performance. The model uses a non-parametric, fully factorized density model for the prior, which is extended with a hyperprior to capture the spatial dependencies in the latent representation. The hyperprior is trained end-to-end with the rest of the model, allowing it to learn a more accurate prior distribution for the latent representation. The model is evaluated on the Kodak dataset and shows significant improvements in compression performance compared to existing methods. The results demonstrate that the hyperprior model consistently outperforms conventional codecs and other ANN-based methods in terms of both PSNR and MS-SSIM. The model also shows that the choice of distortion metric is crucial for optimizing performance, as different metrics can lead to significantly different results. The model's ability to capture spatial dependencies in the latent representation is key to its success in achieving state-of-the-art compression performance.This paper presents a variational image compression model that incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. The model is trained end-to-end and jointly with the underlying autoencoder, allowing it to learn a complex prior distribution for the latent representation. The hyperprior is introduced as a prior on the parameters of the entropy model, making them hyperpriors of the latent representation. This approach enables the model to better capture the statistical dependencies in the latent space, leading to state-of-the-art image compression performance when measured using the MS-SSIM quality index. Additionally, the model achieves superior rate–distortion performance compared to other ANN-based methods when evaluated using PSNR, a metric based on mean squared error. The model is based on variational autoencoders (VAEs), which are probabilistic generative models augmented with approximate inference models. The synthesis and analysis transforms are linked to the generative and inference models, respectively. The goal of variational inference is to approximate the true posterior distribution with a parametric variational density by minimizing the Kullback–Leibler (KL) divergence over the data distribution. This minimization is equivalent to optimizing the compression model for rate–distortion performance. The model uses a non-parametric, fully factorized density model for the prior, which is extended with a hyperprior to capture the spatial dependencies in the latent representation. The hyperprior is trained end-to-end with the rest of the model, allowing it to learn a more accurate prior distribution for the latent representation. The model is evaluated on the Kodak dataset and shows significant improvements in compression performance compared to existing methods. The results demonstrate that the hyperprior model consistently outperforms conventional codecs and other ANN-based methods in terms of both PSNR and MS-SSIM. The model also shows that the choice of distortion metric is crucial for optimizing performance, as different metrics can lead to significantly different results. The model's ability to capture spatial dependencies in the latent representation is key to its success in achieving state-of-the-art compression performance.
Reach us at info@study.space