Understanding Fake or JPEG%3F Revealing Common Biases in Generated Image Detection Datasets

The paper "Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets" by Patrick Grommelt, Louis Weiss, Franz-Josef Pfreundt, and Janis Keuper explores the biases present in datasets used for detecting AI-generated images. The authors highlight that many datasets, such as the *GenImage* dataset, contain biases related to JPEG compression and image size, which can impact the effectiveness and evaluation of detectors. They demonstrate that detectors trained on these datasets learn to identify these biases, leading to suboptimal performance when faced with images that do not exhibit these biases. The paper's main contributions include: 1. Demonstrating that detectors trained on *GenImage* learn to identify JPEG compression and image size biases. 2. Showcasing that removing these biases significantly enhances cross-generator performance, achieving state-of-the-art results with over 11 percentage points improvement for *ResNet50* and *Swin-T* detectors on *GenImage*. The authors provide a detailed analysis of the biases in JPEG compression and image size, showing how these biases affect detector performance. They also present methods to mitigate these biases, such as constraining the training dataset to specific JPEG quality factors and controlling image sizes. The results indicate that removing these biases improves the robustness and generalization of detectors, making them more effective in real-world scenarios. The paper concludes by emphasizing the importance of creating unbiased datasets and suggests that detector models should be trained on the same natural images as their corresponding generative models to ensure better performance and reliability.The paper "Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets" by Patrick Grommelt, Louis Weiss, Franz-Josef Pfreundt, and Janis Keuper explores the biases present in datasets used for detecting AI-generated images. The authors highlight that many datasets, such as the *GenImage* dataset, contain biases related to JPEG compression and image size, which can impact the effectiveness and evaluation of detectors. They demonstrate that detectors trained on these datasets learn to identify these biases, leading to suboptimal performance when faced with images that do not exhibit these biases. The paper's main contributions include: 1. Demonstrating that detectors trained on *GenImage* learn to identify JPEG compression and image size biases. 2. Showcasing that removing these biases significantly enhances cross-generator performance, achieving state-of-the-art results with over 11 percentage points improvement for *ResNet50* and *Swin-T* detectors on *GenImage*. The authors provide a detailed analysis of the biases in JPEG compression and image size, showing how these biases affect detector performance. They also present methods to mitigate these biases, such as constraining the training dataset to specific JPEG quality factors and controlling image sizes. The results indicate that removing these biases improves the robustness and generalization of detectors, making them more effective in real-world scenarios. The paper concludes by emphasizing the importance of creating unbiased datasets and suggests that detector models should be trained on the same natural images as their corresponding generative models to ensure better performance and reliability.

Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets

March 29, 2024 | Patrick Grommelt, Louis Weiss, Franz-Josef Pfreundt, Janis Keuper