[slides and audio] MISC%3A Ultra-Low Bitrate Image Semantic Compression Driven by Large Multimodal Model

This paper proposes a method called Multimodal Image Semantic Compression (MISC) for ultra-low bitrate image compression. MISC leverages Large Multimodal Models (LMMs) to balance consistency with the ground truth and perceptual quality. The framework includes an LMM encoder for semantic information extraction, a map encoder for spatial localization, an image encoder for extreme compression, and a decoder for image reconstruction. Experimental results show that MISC achieves optimal consistency and perceptual quality while reducing the bitrate by 50%, making it suitable for compressing both traditional Natural Sense Images (NSIs) and emerging AI-Generated Images (AIGIs). The method outperforms existing compression algorithms in both consistency and perception at ultra-low bitrates, with MISC-3 achieving the best performance. A new AIGI Semantic Compression Database (AIGI-SCD) is also introduced to evaluate AIGI compression algorithms. The results demonstrate that MISC provides a balanced solution for ultra-low bitrate compression, addressing the trade-off between consistency and perceptual quality. The method is validated through extensive experiments and user studies, showing its effectiveness in real-world scenarios. The proposed approach has strong potential for future image compression applications.This paper proposes a method called Multimodal Image Semantic Compression (MISC) for ultra-low bitrate image compression. MISC leverages Large Multimodal Models (LMMs) to balance consistency with the ground truth and perceptual quality. The framework includes an LMM encoder for semantic information extraction, a map encoder for spatial localization, an image encoder for extreme compression, and a decoder for image reconstruction. Experimental results show that MISC achieves optimal consistency and perceptual quality while reducing the bitrate by 50%, making it suitable for compressing both traditional Natural Sense Images (NSIs) and emerging AI-Generated Images (AIGIs). The method outperforms existing compression algorithms in both consistency and perception at ultra-low bitrates, with MISC-3 achieving the best performance. A new AIGI Semantic Compression Database (AIGI-SCD) is also introduced to evaluate AIGI compression algorithms. The results demonstrate that MISC provides a balanced solution for ultra-low bitrate compression, addressing the trade-off between consistency and perceptual quality. The method is validated through extensive experiments and user studies, showing its effectiveness in real-world scenarios. The proposed approach has strong potential for future image compression applications.

MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

Feb 2024 | Chunyi Li, Guo Lu, Donghui Feng, Haoning Wu, Zicheng Zhang, Xiaohong Liu, Guangtao Zhai, Senior Member, IEEE, Weisi Lin, Fellow, IEEE, Wenjun Zhang, Fellow, IEEE