8 Sep 2018 | David Minnen, Johannes Ballé, George Toderici
This paper introduces a joint autoregressive and hierarchical prior model for learned image compression. The model combines autoregressive and hierarchical priors to improve compression performance while maintaining end-to-end optimization. The model outperforms previous state-of-the-art methods, achieving a 15.8% average reduction in file size over the previous best method based on deep learning, which corresponds to a 59.8% size reduction over JPEG, more than 35% reduction compared to WebP and JPEG2000, and bitstreams 8.4% smaller than BPG, the current state-of-the-art image codec. The model is the first learning-based method to outperform BPG on both PSNR and MS-SSIM distortion metrics.
The model is based on an autoencoder that learns a quantized latent representation of images. The latent representation is then compressed using an entropy model, which is a prior on the latent representation that can be used with standard arithmetic coding algorithms to yield a compressed bitstream. The entropy model is combined with a hierarchical prior to improve the entropy model. The model uses a Gaussian scale mixture (GSM) where the scale parameters are conditioned on a hyperprior. The model allows for end-to-end training, which includes joint optimization of a quantized representation of the hyperprior, the conditional entropy model, and the base autoencoder.
The model is evaluated on the publicly available Kodak image set and shows better rate-distortion performance compared to existing image codecs and learned models. The model is also evaluated using multiscale structural similarity (MS-SSIM) as the image quality metric and shows better performance than all existing methods including all standard codecs and other learning-based methods that were also optimized for MS-SSIM. The model is also evaluated on the Tecnick image set and shows better rate-distortion performance compared to all of the baseline methods.
The model is compared to other models, including a scale-only model, a BPG model, and a JPEG model. The model provides the highest visual quality at similar bit rates. The model is also compared to other models, including a fully-factorized model, a scale-only hyperprior model, a mean & scale hyperprior model, a context-only model, and a context + hyperprior model. The model is found to be the best performing model in terms of rate-distortion performance. The model is also found to be the first learning-based method to outperform BPG on both PSNR and MS-SSIM distortion metrics.This paper introduces a joint autoregressive and hierarchical prior model for learned image compression. The model combines autoregressive and hierarchical priors to improve compression performance while maintaining end-to-end optimization. The model outperforms previous state-of-the-art methods, achieving a 15.8% average reduction in file size over the previous best method based on deep learning, which corresponds to a 59.8% size reduction over JPEG, more than 35% reduction compared to WebP and JPEG2000, and bitstreams 8.4% smaller than BPG, the current state-of-the-art image codec. The model is the first learning-based method to outperform BPG on both PSNR and MS-SSIM distortion metrics.
The model is based on an autoencoder that learns a quantized latent representation of images. The latent representation is then compressed using an entropy model, which is a prior on the latent representation that can be used with standard arithmetic coding algorithms to yield a compressed bitstream. The entropy model is combined with a hierarchical prior to improve the entropy model. The model uses a Gaussian scale mixture (GSM) where the scale parameters are conditioned on a hyperprior. The model allows for end-to-end training, which includes joint optimization of a quantized representation of the hyperprior, the conditional entropy model, and the base autoencoder.
The model is evaluated on the publicly available Kodak image set and shows better rate-distortion performance compared to existing image codecs and learned models. The model is also evaluated using multiscale structural similarity (MS-SSIM) as the image quality metric and shows better performance than all existing methods including all standard codecs and other learning-based methods that were also optimized for MS-SSIM. The model is also evaluated on the Tecnick image set and shows better rate-distortion performance compared to all of the baseline methods.
The model is compared to other models, including a scale-only model, a BPG model, and a JPEG model. The model provides the highest visual quality at similar bit rates. The model is also compared to other models, including a fully-factorized model, a scale-only hyperprior model, a mean & scale hyperprior model, a context-only model, and a context + hyperprior model. The model is found to be the best performing model in terms of rate-distortion performance. The model is also found to be the first learning-based method to outperform BPG on both PSNR and MS-SSIM distortion metrics.