The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

DECEMBER XXXX | Yuankun Xie, Yi Lu, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Jianhua Tao, Xin Qi, Xiaopeng Wang, Yukun Liu, Haonan Cheng, Long Ye, Yi Sun
The paper introduces the Codecfake dataset and proposes the CSAM strategy to detect ALM-based deepfake audio. ALM-based deepfake audio is challenging for current ADD models due to its widespread use, high deception, and type versatility. The Codecfake dataset includes over 1M audio samples across two languages, constructed using seven neural codec models. It covers various test conditions, including unseen codec-based audio and in-the-wild ALM-based audio. The CSAM strategy is proposed to learn a domain-balanced and generalized minimum, improving detection performance. Experiments show that the Codecfake dataset and CSAM strategy achieve the lowest average EER of 0.616% across all test conditions. The dataset and code are available online. The paper also discusses related work on vocoder-based and codec-based deepfake audio, and audio deepfake detection datasets. The Codecfake dataset is designed to address the limitations of traditional ADD datasets, which are primarily based on vocoders. The dataset includes a variety of audio types and is constructed using seven neural codec models. The paper evaluates the performance of ADD models trained on the Codecfake dataset and shows that they achieve better results than vocoder-based models. The CSAM strategy is proposed to improve the generalization of ADD models, and experiments show that it achieves the lowest EER. The paper also discusses the impact of codec settings on ADD performance and evaluates the effectiveness of ADD models on ALM-based audio. The results show that the Codecfake dataset and CSAM strategy significantly improve the detection of ALM-based deepfake audio.The paper introduces the Codecfake dataset and proposes the CSAM strategy to detect ALM-based deepfake audio. ALM-based deepfake audio is challenging for current ADD models due to its widespread use, high deception, and type versatility. The Codecfake dataset includes over 1M audio samples across two languages, constructed using seven neural codec models. It covers various test conditions, including unseen codec-based audio and in-the-wild ALM-based audio. The CSAM strategy is proposed to learn a domain-balanced and generalized minimum, improving detection performance. Experiments show that the Codecfake dataset and CSAM strategy achieve the lowest average EER of 0.616% across all test conditions. The dataset and code are available online. The paper also discusses related work on vocoder-based and codec-based deepfake audio, and audio deepfake detection datasets. The Codecfake dataset is designed to address the limitations of traditional ADD datasets, which are primarily based on vocoders. The dataset includes a variety of audio types and is constructed using seven neural codec models. The paper evaluates the performance of ADD models trained on the Codecfake dataset and shows that they achieve better results than vocoder-based models. The CSAM strategy is proposed to improve the generalization of ADD models, and experiments show that it achieves the lowest EER. The paper also discusses the impact of codec settings on ADD performance and evaluates the effectiveness of ADD models on ALM-based audio. The results show that the Codecfake dataset and CSAM strategy significantly improve the detection of ALM-based deepfake audio.
Reach us at info@study.space
[slides] The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio | StudySpace