1 Mar 2017 | Lucas Theis, Wenzhe Shi, Andrew Cunningham& Ferenc Huszár
The paper introduces a novel approach to optimizing autoencoders for lossy image compression, addressing the need for more flexible compression algorithms in the face of evolving media formats and hardware technology. The authors propose a method to handle the non-differentiability of quantization, a key component in lossy compression, by replacing the derivative of the rounding function with a smooth approximation. This approach allows for the training of deep autoencoders that compete with JPEG 2000 and outperform recent RNN-based methods. The network is designed to be computationally efficient, suitable for high-resolution images, and can be optimized for specific content types and metrics. The paper also discusses the use of sub-pixel convolutions and Gaussian scale mixtures for entropy coding, and presents experimental results showing superior performance in terms of perceptual quality (SSIM and MOS) compared to JPEG 2000 and other methods. The authors conclude by highlighting the potential of end-to-end trained autoencoders for adapting to new media formats and the challenges in developing perceptually relevant metrics for optimization.The paper introduces a novel approach to optimizing autoencoders for lossy image compression, addressing the need for more flexible compression algorithms in the face of evolving media formats and hardware technology. The authors propose a method to handle the non-differentiability of quantization, a key component in lossy compression, by replacing the derivative of the rounding function with a smooth approximation. This approach allows for the training of deep autoencoders that compete with JPEG 2000 and outperform recent RNN-based methods. The network is designed to be computationally efficient, suitable for high-resolution images, and can be optimized for specific content types and metrics. The paper also discusses the use of sub-pixel convolutions and Gaussian scale mixtures for entropy coding, and presents experimental results showing superior performance in terms of perceptual quality (SSIM and MOS) compared to JPEG 2000 and other methods. The authors conclude by highlighting the potential of end-to-end trained autoencoders for adapting to new media formats and the challenges in developing perceptually relevant metrics for optimization.