[slides] AEROBLADE%3A Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error

AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error **Authors:** Jonas Ricker, Denis Lukovnikov, Asja Fischer **Institution:** Ruhr University Bochum **Emails:** {jonas.ricker, denis.lukovnikov, asja.fischer}@rub.de **Abstract:** Recent advancements in text-to-image models have enabled the generation of highly realistic images, posing a significant threat to visual disinformation. Latent diffusion models (LDMs) have become a key enabler for generating high-resolution images with low computational cost. However, their forensic analysis is still in its infancy. This paper introduces AEROBLADE, a novel detection method that leverages the autoencoder (AE) used in LDMs to transform images between the image and latent spaces. The method exploits the fact that generated images can be more accurately reconstructed by the AE than real images, leading to a simple detection approach based on the reconstruction error. AEROBLADE is easy to implement, does not require training, and achieves performance comparable to extensively trained detectors. Empirical results demonstrate its effectiveness against state-of-the-art LDMs, including Stable Diffusion and Midjourney. Beyond detection, AEROBLADE provides qualitative insights for identifying inpainted regions within real images. **Contributions:** - AEROBLADE: A simple, training-free method for detecting LDM-generated images based on AE reconstruction error. - Empirical evaluation shows AEROBLADE effectively distinguishes real images from images generated by seven state-of-the-art LDMs. - Qualitative analysis using AEROBLADE helps identify inpainted regions within real images. **Introduction:** The emergence of powerful text-to-image models like Stable Diffusion and Midjourney has made it easy to generate hyperrealistic images. While these models offer creative possibilities, they also pose risks, such as the erosion of trust in legitimate sources due to the proliferation of synthetic media. LDMs, which use pre-trained AEs to transform images between the image and latent spaces, have scaled the generation process to high resolutions while keeping computational costs low. However, their forensic analysis is underdeveloped. AEROBLADE leverages the AE reconstruction error to distinguish real from generated images, achieving high precision without training. **Methodology:** AEROBLADE computes the reconstruction error of an image using the AE of an LDM, defined as the distance between the original image and its reconstruction. The method is effective against a wide range of models and can be extended to new ones. It achieves a mean average precision (AP) of 0.992 on various state-of-the-art models, performing almost as well as extensively trained classifiers. **Experiments:** - **Setup:** AEROBLADE is evaluated on images from seven text-to-image LDMs, including Stable Diffusion, Kandinsky, and MidjourAEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error **Authors:** Jonas Ricker, Denis Lukovnikov, Asja Fischer **Institution:** Ruhr University Bochum **Emails:** {jonas.ricker, denis.lukovnikov, asja.fischer}@rub.de **Abstract:** Recent advancements in text-to-image models have enabled the generation of highly realistic images, posing a significant threat to visual disinformation. Latent diffusion models (LDMs) have become a key enabler for generating high-resolution images with low computational cost. However, their forensic analysis is still in its infancy. This paper introduces AEROBLADE, a novel detection method that leverages the autoencoder (AE) used in LDMs to transform images between the image and latent spaces. The method exploits the fact that generated images can be more accurately reconstructed by the AE than real images, leading to a simple detection approach based on the reconstruction error. AEROBLADE is easy to implement, does not require training, and achieves performance comparable to extensively trained detectors. Empirical results demonstrate its effectiveness against state-of-the-art LDMs, including Stable Diffusion and Midjourney. Beyond detection, AEROBLADE provides qualitative insights for identifying inpainted regions within real images. **Contributions:** - AEROBLADE: A simple, training-free method for detecting LDM-generated images based on AE reconstruction error. - Empirical evaluation shows AEROBLADE effectively distinguishes real images from images generated by seven state-of-the-art LDMs. - Qualitative analysis using AEROBLADE helps identify inpainted regions within real images. **Introduction:** The emergence of powerful text-to-image models like Stable Diffusion and Midjourney has made it easy to generate hyperrealistic images. While these models offer creative possibilities, they also pose risks, such as the erosion of trust in legitimate sources due to the proliferation of synthetic media. LDMs, which use pre-trained AEs to transform images between the image and latent spaces, have scaled the generation process to high resolutions while keeping computational costs low. However, their forensic analysis is underdeveloped. AEROBLADE leverages the AE reconstruction error to distinguish real from generated images, achieving high precision without training. **Methodology:** AEROBLADE computes the reconstruction error of an image using the AE of an LDM, defined as the distance between the original image and its reconstruction. The method is effective against a wide range of models and can be extended to new ones. It achieves a mean average precision (AP) of 0.992 on various state-of-the-art models, performing almost as well as extensively trained classifiers. **Experiments:** - **Setup:** AEROBLADE is evaluated on images from seven text-to-image LDMs, including Stable Diffusion, Kandinsky, and Midjour

AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error

27 Mar 2024 | Jonas Ricker, Denis Lukovnikov, Asja Fischer