12 Apr 2024 | Lucas Relic, Roberto Azevedo, Markus Gross, and Christopher Schroers
This paper proposes a novel lossy image compression codec based on foundation latent diffusion models, which achieves high-quality reconstructions at low bitrates. The method leverages the similarity between quantization error and noise, using diffusion models to recover lost information in the transmitted image latent. By performing only a fraction of the full diffusion generative process, the approach requires no architectural changes to the diffusion model, enabling the use of foundation models as a strong prior without additional fine-tuning. The proposed codec outperforms previous methods in quantitative realism metrics and is qualitatively preferred by end users, even when other methods use twice the bitrate.
The method involves an autoencoder from a foundation latent diffusion model to transform an input image into a lower-dimensional latent space, a learned adaptive quantization and entropy encoder for bitrate control, a learned method to predict the ideal denoising timestep, and a diffusion decoding process to synthesize information lost during quantization. The key innovation is the ability to adaptively control the number of denoising steps based on the target bitrate, significantly reducing the computational cost while maintaining high-quality reconstructions.
The method is evaluated on several datasets, including Kodak, CLIC2022, and MS-COCO 30k, using objective metrics such as PSNR, MS-SSIM, LPIPS, and FID, as well as a subjective user study. The results show that the proposed method achieves state-of-the-art visual quality and is preferred by users over existing methods. The method is also more efficient in terms of decoding time and training budget compared to previous diffusion codecs.
The paper highlights the potential of foundation diffusion models in image compression, demonstrating their ability to produce realistic and detailed reconstructions at low bitrates. The method is fundamentally independent of the chosen foundation model, making it a promising approach for future research in lossy image compression. However, the method has limitations, such as the potential for inaccurate reconstruction in certain cases and ethical concerns related to the generation of content at very low bitrates.This paper proposes a novel lossy image compression codec based on foundation latent diffusion models, which achieves high-quality reconstructions at low bitrates. The method leverages the similarity between quantization error and noise, using diffusion models to recover lost information in the transmitted image latent. By performing only a fraction of the full diffusion generative process, the approach requires no architectural changes to the diffusion model, enabling the use of foundation models as a strong prior without additional fine-tuning. The proposed codec outperforms previous methods in quantitative realism metrics and is qualitatively preferred by end users, even when other methods use twice the bitrate.
The method involves an autoencoder from a foundation latent diffusion model to transform an input image into a lower-dimensional latent space, a learned adaptive quantization and entropy encoder for bitrate control, a learned method to predict the ideal denoising timestep, and a diffusion decoding process to synthesize information lost during quantization. The key innovation is the ability to adaptively control the number of denoising steps based on the target bitrate, significantly reducing the computational cost while maintaining high-quality reconstructions.
The method is evaluated on several datasets, including Kodak, CLIC2022, and MS-COCO 30k, using objective metrics such as PSNR, MS-SSIM, LPIPS, and FID, as well as a subjective user study. The results show that the proposed method achieves state-of-the-art visual quality and is preferred by users over existing methods. The method is also more efficient in terms of decoding time and training budget compared to previous diffusion codecs.
The paper highlights the potential of foundation diffusion models in image compression, demonstrating their ability to produce realistic and detailed reconstructions at low bitrates. The method is fundamentally independent of the chosen foundation model, making it a promising approach for future research in lossy image compression. However, the method has limitations, such as the potential for inaccurate reconstruction in certain cases and ethical concerns related to the generation of content at very low bitrates.