MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

16 Apr 2024 | Nicolas M. Müller, Piotr Kawa, Wei Herng Choong, Edresson Casanova, Eren Gölge, Thorsten Müller, Piotr Syga, Philip Sperl, Konstantin Böttinger
The paper introduces the Multi-Language Audio Anti-Spoof Dataset (MLAAD), a comprehensive dataset designed to address the limitations of existing anti-spoofing datasets, which are predominantly focused on English and Chinese audio. MLAAD is created using 54 Text-to-Speech (TTS) models, generating 163.9 hours of synthetic voice in 23 different languages. The dataset is evaluated with three state-of-the-art deepfake detection models, showing superior performance compared to datasets like InTheWild or FakeOrReal. MLAAD also complements the ASVspoof 2019 dataset, with both datasets alternatingly outperforming each other across eight datasets. The authors aim to democratize anti-spoofing technology by making MLAAD and trained models accessible via an interactive webserver, contributing to global efforts against audio spoofing and deepfakes. The dataset's quality is assessed using automatic speaker recognition tools, and the results indicate that the synthesized audio is comparable to original audio in many cases, especially for languages present in the M-AILABS dataset. However, underrepresented languages like Hindi, Maltese, Swahili, and Ukrainian show higher error rates, highlighting the need for enhanced support in these languages.The paper introduces the Multi-Language Audio Anti-Spoof Dataset (MLAAD), a comprehensive dataset designed to address the limitations of existing anti-spoofing datasets, which are predominantly focused on English and Chinese audio. MLAAD is created using 54 Text-to-Speech (TTS) models, generating 163.9 hours of synthetic voice in 23 different languages. The dataset is evaluated with three state-of-the-art deepfake detection models, showing superior performance compared to datasets like InTheWild or FakeOrReal. MLAAD also complements the ASVspoof 2019 dataset, with both datasets alternatingly outperforming each other across eight datasets. The authors aim to democratize anti-spoofing technology by making MLAAD and trained models accessible via an interactive webserver, contributing to global efforts against audio spoofing and deepfakes. The dataset's quality is assessed using automatic speaker recognition tools, and the results indicate that the synthesized audio is comparable to original audio in many cases, especially for languages present in the M-AILABS dataset. However, underrepresented languages like Hindi, Maltese, Swahili, and Ukrainian show higher error rates, highlighting the need for enhanced support in these languages.
Reach us at info@study.space