DeepFake Detection Based on High-Frequency Enhancement Network for Highly Compressed Content

DeepFake Detection Based on High-Frequency Enhancement Network for Highly Compressed Content

2024 | Jie Gao, Zhaqiang Xia, Gian Luca Marcialis, Chen Dang, Jing Dai, Xiaoyi Feng
This paper proposes a novel High-Frequency Enhancement (HiFE) framework for detecting DeepFakes in highly compressed content. The framework leverages a learnable adaptive high-frequency enhancement network to enrich weak high-frequency information in compressed content without uncompressed data supervision. The framework consists of three branches: the Basic branch with RGB domain, the Local High-Frequency Enhancement branch with Block-wise Discrete Cosine Transform (DCT), and the Global High-Frequency Enhancement branch with Multi-level Discrete Wavelet Transform (DWT). The Local branch utilizes DCT coefficients and channel attention mechanism to indirectly achieve adaptive frequency-aware multi-spatial attention, while the Global branch supplements high-frequency information by extracting coarse-to-fine multi-scale high-frequency cues and cascade-residual-based multi-level fusion by DWT coefficients. A Two-Stage Cross-Fusion module is designed to effectively integrate all information, thereby greatly enhancing weak high-frequency information in low-quality data. Experimental results on FaceForensics++, Celeb-DF, and OpenForensics datasets show that the proposed method outperforms existing state-of-the-art methods and can effectively improve the detection performance of DeepFakes, especially on low-quality data. The HiFE network is designed to enhance local and global high-frequency information in low-quality data through adaptive use of DCT and DWT without supervision from high-quality data. The proposed method achieves significant improvements and state-of-the-art performance on low-quality DeepFake data with a small computational burden. The main contributions include analyzing the difference between low-quality and high-quality data, proposing a HiFE network for low-quality data, and designing a Two-Stage Cross-Fusion strategy to simulate the complementarity between different domains. The method is more generalizable and flexible as it can be embedded into any network. The HiFE network is based on a series of analyses showing the importance of high-frequency information in detecting low-quality synthetic content. The method is evaluated on three different compressed versions of the FaceForensics++ dataset, showing that the performance of DeepFake detection models degrades with higher compression. The method is validated through experiments on information removal, exchange, and re-learning strategies, demonstrating the importance of high-frequency information in DeepFake detection. The HiFE network is designed to enhance local and global high-frequency information in low-quality data through adaptive use of DCT and DWT without supervision from high-quality data. The method is more generalizable and flexible as it can be embedded into any network. The HiFE network is based on a series of analyses showing the importance of high-frequency information in detecting low-quality synthetic content. The method is evaluated on three different compressed versions of the FaceForensics++ dataset, showing that the performance of DeepFake detection models degrades with higher compression. The method is validated through experiments on information removal, exchange, and re-learning strategies, demonstrating the importance of high-frequency information in DeepFake detection.This paper proposes a novel High-Frequency Enhancement (HiFE) framework for detecting DeepFakes in highly compressed content. The framework leverages a learnable adaptive high-frequency enhancement network to enrich weak high-frequency information in compressed content without uncompressed data supervision. The framework consists of three branches: the Basic branch with RGB domain, the Local High-Frequency Enhancement branch with Block-wise Discrete Cosine Transform (DCT), and the Global High-Frequency Enhancement branch with Multi-level Discrete Wavelet Transform (DWT). The Local branch utilizes DCT coefficients and channel attention mechanism to indirectly achieve adaptive frequency-aware multi-spatial attention, while the Global branch supplements high-frequency information by extracting coarse-to-fine multi-scale high-frequency cues and cascade-residual-based multi-level fusion by DWT coefficients. A Two-Stage Cross-Fusion module is designed to effectively integrate all information, thereby greatly enhancing weak high-frequency information in low-quality data. Experimental results on FaceForensics++, Celeb-DF, and OpenForensics datasets show that the proposed method outperforms existing state-of-the-art methods and can effectively improve the detection performance of DeepFakes, especially on low-quality data. The HiFE network is designed to enhance local and global high-frequency information in low-quality data through adaptive use of DCT and DWT without supervision from high-quality data. The proposed method achieves significant improvements and state-of-the-art performance on low-quality DeepFake data with a small computational burden. The main contributions include analyzing the difference between low-quality and high-quality data, proposing a HiFE network for low-quality data, and designing a Two-Stage Cross-Fusion strategy to simulate the complementarity between different domains. The method is more generalizable and flexible as it can be embedded into any network. The HiFE network is based on a series of analyses showing the importance of high-frequency information in detecting low-quality synthetic content. The method is evaluated on three different compressed versions of the FaceForensics++ dataset, showing that the performance of DeepFake detection models degrades with higher compression. The method is validated through experiments on information removal, exchange, and re-learning strategies, demonstrating the importance of high-frequency information in DeepFake detection. The HiFE network is designed to enhance local and global high-frequency information in low-quality data through adaptive use of DCT and DWT without supervision from high-quality data. The method is more generalizable and flexible as it can be embedded into any network. The HiFE network is based on a series of analyses showing the importance of high-frequency information in detecting low-quality synthetic content. The method is evaluated on three different compressed versions of the FaceForensics++ dataset, showing that the performance of DeepFake detection models degrades with higher compression. The method is validated through experiments on information removal, exchange, and re-learning strategies, demonstrating the importance of high-frequency information in DeepFake detection.
Reach us at info@futurestudyspace.com