16 May 2024 | Giordano Cicchetti*, Eleonora Grassucci*, Jihong Park†, Jinho Choi†, Sergio Barbarossa*, and Danilo Comminiello*
This paper proposes a novel language-oriented semantic communication (SC) framework for image transmission. The framework communicates both text and a compressed image embedding, combining them using a latent diffusion model to reconstruct the intended image. The approach aims to overcome the limitations of traditional text-based SC, where text descriptions are too coarse to capture complex visual features like spatial locations, color, and texture, leading to significant perceptual differences between the original and reconstructed images.
The proposed framework uses an image-to-text (I2T) model to generate a textual caption of the image and an image encoder to create a latent representation of the image. These two components are then transmitted over a noisy channel. At the receiver side, a latent diffusion model is used to regenerate the image based on the received text and latent embedding. This approach allows for more accurate and perceptually similar reconstructed images compared to traditional text-based SC methods.
The framework is validated through experiments on the Flickr-8k dataset, demonstrating that the proposed method achieves higher perceptual similarities in noisy communication channels compared to a baseline SC method that only uses text. The method transmits only 2.09% of the original image size, highlighting its bandwidth efficiency. The results show that combining text and latent embeddings leads to better image reconstruction, especially under poor channel conditions. The framework also allows for adaptive transmission, where only text is sent under limited bandwidth or poor channel conditions, while latent embeddings are used when the network conditions improve.
The proposed framework leverages the power of latent diffusion models to generate high-quality images from text and latent embeddings, providing a more efficient and effective solution for semantic communication. The method is flexible and can be adapted to different communication scenarios, making it a promising approach for future semantic communication systems.This paper proposes a novel language-oriented semantic communication (SC) framework for image transmission. The framework communicates both text and a compressed image embedding, combining them using a latent diffusion model to reconstruct the intended image. The approach aims to overcome the limitations of traditional text-based SC, where text descriptions are too coarse to capture complex visual features like spatial locations, color, and texture, leading to significant perceptual differences between the original and reconstructed images.
The proposed framework uses an image-to-text (I2T) model to generate a textual caption of the image and an image encoder to create a latent representation of the image. These two components are then transmitted over a noisy channel. At the receiver side, a latent diffusion model is used to regenerate the image based on the received text and latent embedding. This approach allows for more accurate and perceptually similar reconstructed images compared to traditional text-based SC methods.
The framework is validated through experiments on the Flickr-8k dataset, demonstrating that the proposed method achieves higher perceptual similarities in noisy communication channels compared to a baseline SC method that only uses text. The method transmits only 2.09% of the original image size, highlighting its bandwidth efficiency. The results show that combining text and latent embeddings leads to better image reconstruction, especially under poor channel conditions. The framework also allows for adaptive transmission, where only text is sent under limited bandwidth or poor channel conditions, while latent embeddings are used when the network conditions improve.
The proposed framework leverages the power of latent diffusion models to generate high-quality images from text and latent embeddings, providing a more efficient and effective solution for semantic communication. The method is flexible and can be adapted to different communication scenarios, making it a promising approach for future semantic communication systems.