DUAL-PATH MAMBA: SHORT AND LONG-TERM BIDIRECTIONAL SELECTIVE STRUCTURED STATE SPACE MODELS FOR SPEECH SEPARATION

DUAL-PATH MAMBA: SHORT AND LONG-TERM BIDIRECTIONAL SELECTIVE STRUCTURED STATE SPACE MODELS FOR SPEECH SEPARATION

1 May 2024 | Xilin Jiang, Cong Han, Nima Mesgarani
The paper introduces Dual-Path Mamba (DPMamba), a novel model for speech separation that leverages selective state space models (SSMs) to efficiently model long speech sequences. DPMamba combines the strengths of Mamba, a selective SSM, with a dual-path architecture to handle both short-term and long-term dependencies in speech signals. The model is designed to address the computational and memory inefficiencies of transformers, which have quadratic complexity in self-attention operations. DPMamba processes speech signals in both forward and backward directions, using bidirectional Mamba blocks, and chunks the signal into smaller segments for easier processing. Experimental results on the WSJ0-2mix dataset show that DPMamba models of comparable or smaller sizes outperform state-of-the-art models from different architecture families, including CNN-based Wavesplit, RNN-based DPRNN, and transformer-based Sepformer. The paper also discusses the efficiency and performance of DPMamba, highlighting its potential for mobile devices due to its low GPU memory consumption. Future work will focus on enhancing the efficiency of the Mamba model and integrating it with other network layers to further improve performance.The paper introduces Dual-Path Mamba (DPMamba), a novel model for speech separation that leverages selective state space models (SSMs) to efficiently model long speech sequences. DPMamba combines the strengths of Mamba, a selective SSM, with a dual-path architecture to handle both short-term and long-term dependencies in speech signals. The model is designed to address the computational and memory inefficiencies of transformers, which have quadratic complexity in self-attention operations. DPMamba processes speech signals in both forward and backward directions, using bidirectional Mamba blocks, and chunks the signal into smaller segments for easier processing. Experimental results on the WSJ0-2mix dataset show that DPMamba models of comparable or smaller sizes outperform state-of-the-art models from different architecture families, including CNN-based Wavesplit, RNN-based DPRNN, and transformer-based Sepformer. The paper also discusses the efficiency and performance of DPMamba, highlighting its potential for mobile devices due to its low GPU memory consumption. Future work will focus on enhancing the efficiency of the Mamba model and integrating it with other network layers to further improve performance.
Reach us at info@study.space
[slides and audio] Dual-path Mamba%3A Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation