4 Sep 2018 | Darius Afchar, Vincent Nozick, Junichi Yamagishi, Isao Echizen
This paper presents a method for automatically and efficiently detecting face tampering in videos, focusing on two recent techniques: Deepfake and Face2Face. Traditional image forensics techniques are not suitable for videos due to compression degradation. Instead, the paper proposes two deep learning networks with few layers to focus on mesoscopic image properties. These networks are evaluated on existing and newly created datasets, achieving high detection rates: over 98% for Deepfake and 95% for Face2Face.
Deepfake replaces a person's face in a video with another's. It uses autoencoders to train on facial images, allowing the generation of realistic face swaps. Face2Face, on the other hand, transfers facial expressions from one person to another in real-time using a simple RGB camera.
The proposed method analyzes videos at a mesoscopic level, avoiding the limitations of microscopic analysis in compressed videos and the difficulty of distinguishing forgeries at a semantic level. Two networks, Meso-4 and MesoInception-4, are introduced. Both have low parameter counts and achieve high classification scores.
Experiments on the Deepfake and Face2Face datasets show that the networks perform well, with MesoInception-4 achieving over 98% detection rate on the Deepfake dataset. The method is robust to video compression, with results showing that aggregation of frame predictions improves detection accuracy. The networks also show that eyes and mouths are critical for detecting Deepfake forgeries, as these areas tend to be less detailed in forged images.
The paper concludes that the proposed networks are effective for detecting face tampering in videos with low computational cost. The method is validated on real-world conditions, showing high detection rates for both Deepfake and Face2Face videos. The study also highlights the importance of understanding deep learning models to evaluate their effectiveness and limitations.This paper presents a method for automatically and efficiently detecting face tampering in videos, focusing on two recent techniques: Deepfake and Face2Face. Traditional image forensics techniques are not suitable for videos due to compression degradation. Instead, the paper proposes two deep learning networks with few layers to focus on mesoscopic image properties. These networks are evaluated on existing and newly created datasets, achieving high detection rates: over 98% for Deepfake and 95% for Face2Face.
Deepfake replaces a person's face in a video with another's. It uses autoencoders to train on facial images, allowing the generation of realistic face swaps. Face2Face, on the other hand, transfers facial expressions from one person to another in real-time using a simple RGB camera.
The proposed method analyzes videos at a mesoscopic level, avoiding the limitations of microscopic analysis in compressed videos and the difficulty of distinguishing forgeries at a semantic level. Two networks, Meso-4 and MesoInception-4, are introduced. Both have low parameter counts and achieve high classification scores.
Experiments on the Deepfake and Face2Face datasets show that the networks perform well, with MesoInception-4 achieving over 98% detection rate on the Deepfake dataset. The method is robust to video compression, with results showing that aggregation of frame predictions improves detection accuracy. The networks also show that eyes and mouths are critical for detecting Deepfake forgeries, as these areas tend to be less detailed in forged images.
The paper concludes that the proposed networks are effective for detecting face tampering in videos with low computational cost. The method is validated on real-world conditions, showing high detection rates for both Deepfake and Face2Face videos. The study also highlights the importance of understanding deep learning models to evaluate their effectiveness and limitations.