10 May 2024 | Rong Chao†§, Wen-Huang Cheng§, Moreno La Quatra† Sabato Marco Siniscalchi†, Chao-Han Huck Yang*, Szu-Wei Fu*, Yu Tsao‡
This paper investigates the use of Mamba, a scalable state-space model (SSM), for speech enhancement (SE). The authors develop SEMamba, a SE system based on Mamba, and evaluate its performance using both basic and advanced SE systems. Mamba is integrated with signal-level distances and metric-oriented loss functions. SEMamba achieves a PESQ score of 3.55 on the VoiceBank-DEMAND dataset and improves to 3.69 when combined with perceptual contrast stretching (PCS). The study compares SEMamba with Transformer-based models, demonstrating that Mamba offers comparable or superior performance with fewer computational resources. The research also explores the impact of bi-directional Mamba, consistency loss (CL), and PCS on SE performance. The results show that SEMamba with PCS achieves state-of-the-art (SOTA) PESQ scores, highlighting the potential of Mamba in advanced SE tasks.This paper investigates the use of Mamba, a scalable state-space model (SSM), for speech enhancement (SE). The authors develop SEMamba, a SE system based on Mamba, and evaluate its performance using both basic and advanced SE systems. Mamba is integrated with signal-level distances and metric-oriented loss functions. SEMamba achieves a PESQ score of 3.55 on the VoiceBank-DEMAND dataset and improves to 3.69 when combined with perceptual contrast stretching (PCS). The study compares SEMamba with Transformer-based models, demonstrating that Mamba offers comparable or superior performance with fewer computational resources. The research also explores the impact of bi-directional Mamba, consistency loss (CL), and PCS on SE performance. The results show that SEMamba with PCS achieves state-of-the-art (SOTA) PESQ scores, highlighting the potential of Mamba in advanced SE tasks.