[slides and audio] RawBMamba%3A End-to-End Bidirectional State Space Model for Audio Deepfake Detection

RawBMamba: An end-to-end bidirectional state space model for audio deepfake detection This paper proposes RawBMamba, an end-to-end bidirectional state space model for audio deepfake detection. The model combines short- and long-range feature information to effectively distinguish between genuine and fake audio. RawBMamba uses a series of parametric sinc functions and multiple convolutional layers to capture short-range features, followed by a bidirectional Mamba to capture long-range features. A bidirectional fusion module integrates embeddings to enhance audio context representation and combine short- and long-range information. The model outperforms Rawformer on the ASVspoof2021 LA dataset, achieving a 34.1% improvement. It also demonstrates competitive performance on other datasets. RawBMamba is designed to address the unidirectional modeling limitation of Mamba, enabling the capture of both forward and backward long-range features. The bidirectional feature fusion module merges these features to obtain a comprehensive representation. Experiments show that RawBMamba achieves strong performance across multiple datasets, including ASVspoof2019 LA, ASVspoof2021 LA, and ASVspoof2021 DF. It outperforms other end-to-end models, including Rawformer and RawMamba, and performs closely to multi-system score fusion T23 on the 21DF dataset. Visualization analysis using t-SNE plots shows that Mamba features are more discriminable than Transformer features, indicating the effectiveness of the Mamba architecture in audio deepfake detection. The model's bidirectional Mamba and fusion module enable it to capture both short- and long-range features, enhancing its generalization ability and robustness to out-of-domain data. The results demonstrate that RawBMamba is a promising approach for audio deepfake detection, with potential for future applications in provenance tasks.RawBMamba: An end-to-end bidirectional state space model for audio deepfake detection This paper proposes RawBMamba, an end-to-end bidirectional state space model for audio deepfake detection. The model combines short- and long-range feature information to effectively distinguish between genuine and fake audio. RawBMamba uses a series of parametric sinc functions and multiple convolutional layers to capture short-range features, followed by a bidirectional Mamba to capture long-range features. A bidirectional fusion module integrates embeddings to enhance audio context representation and combine short- and long-range information. The model outperforms Rawformer on the ASVspoof2021 LA dataset, achieving a 34.1% improvement. It also demonstrates competitive performance on other datasets. RawBMamba is designed to address the unidirectional modeling limitation of Mamba, enabling the capture of both forward and backward long-range features. The bidirectional feature fusion module merges these features to obtain a comprehensive representation. Experiments show that RawBMamba achieves strong performance across multiple datasets, including ASVspoof2019 LA, ASVspoof2021 LA, and ASVspoof2021 DF. It outperforms other end-to-end models, including Rawformer and RawMamba, and performs closely to multi-system score fusion T23 on the 21DF dataset. Visualization analysis using t-SNE plots shows that Mamba features are more discriminable than Transformer features, indicating the effectiveness of the Mamba architecture in audio deepfake detection. The model's bidirectional Mamba and fusion module enable it to capture both short- and long-range features, enhancing its generalization ability and robustness to out-of-domain data. The results demonstrate that RawBMamba is a promising approach for audio deepfake detection, with potential for future applications in provenance tasks.

RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection

18 Jun 2024 | Yujie Chen, Jiangyan Yi, Jun Xue, Chenglong Wang, Xiaohui Zhang, Shunbo Dong, Siding Zeng, Jianhua Tao, Lv Zhao, Cunhang Fan