Revisiting Multi-modal Emotion Learning with Broad State Space Models and Probability-guidance Fusion

Revisiting Multi-modal Emotion Learning with Broad State Space Models and Probability-guidance Fusion

3 May 2024 | Yuntao Shou, Tao Meng, Fuchen Zhang, Nan Yin, Keqin Li
This paper proposes a novel method for Multi-modal Emotion Recognition (MERC) that integrates Broad Learning Systems (BLS) and State Space Models (SSMs) to enhance the performance of emotion recognition in conversations. The method focuses on two key stages: feature disentanglement and feature fusion. In the feature disentanglement stage, a Broad Mamba is introduced, which uses SSMs to model long-range contextual semantic information and a BLS to explore potential data distributions in a broad space. This approach avoids the quadratic complexity of self-attention mechanisms and improves the efficiency of long-distance context modeling. In the feature fusion stage, a probability-guided fusion mechanism is proposed, which uses predicted label probabilities to dynamically assign weights to different modalities, maximizing the consistency of information between modalities. The proposed method is evaluated on two widely used benchmark datasets, IEMOCAP and MELD. Experimental results show that the method achieves state-of-the-art performance with low computational consumption, outperforming traditional models like Transformers and GNNs. The method is efficient and effective, demonstrating significant improvements in emotion recognition performance. The results also show that the proposed method is computationally efficient, with a parameter count of 1.73M, which is significantly lower than other methods. The method also shows high efficiency in inference time, with inference times below 10 seconds on both datasets. The method is also effective in multi-modal feature fusion, with ablation studies showing that the use of multi-modal features significantly improves emotion recognition performance. The method is also effective in visualizing multi-modal emotion representations, with t-SNE projections showing that the proposed method can effectively separate different emotion categories. The method is also effective in error analysis, with the proposed method showing better performance in emotion classification compared to other methods. The method is also effective in multi-modal representation visualization, with the proposed method showing better performance in separating different emotion categories. The method is also effective in multi-modal feature fusion, with the proposed method showing better performance in emotion recognition compared to other methods. The method is also effective in multi-modal feature fusion, with the proposed method showing better performance in emotion recognition compared to other methods.This paper proposes a novel method for Multi-modal Emotion Recognition (MERC) that integrates Broad Learning Systems (BLS) and State Space Models (SSMs) to enhance the performance of emotion recognition in conversations. The method focuses on two key stages: feature disentanglement and feature fusion. In the feature disentanglement stage, a Broad Mamba is introduced, which uses SSMs to model long-range contextual semantic information and a BLS to explore potential data distributions in a broad space. This approach avoids the quadratic complexity of self-attention mechanisms and improves the efficiency of long-distance context modeling. In the feature fusion stage, a probability-guided fusion mechanism is proposed, which uses predicted label probabilities to dynamically assign weights to different modalities, maximizing the consistency of information between modalities. The proposed method is evaluated on two widely used benchmark datasets, IEMOCAP and MELD. Experimental results show that the method achieves state-of-the-art performance with low computational consumption, outperforming traditional models like Transformers and GNNs. The method is efficient and effective, demonstrating significant improvements in emotion recognition performance. The results also show that the proposed method is computationally efficient, with a parameter count of 1.73M, which is significantly lower than other methods. The method also shows high efficiency in inference time, with inference times below 10 seconds on both datasets. The method is also effective in multi-modal feature fusion, with ablation studies showing that the use of multi-modal features significantly improves emotion recognition performance. The method is also effective in visualizing multi-modal emotion representations, with t-SNE projections showing that the proposed method can effectively separate different emotion categories. The method is also effective in error analysis, with the proposed method showing better performance in emotion classification compared to other methods. The method is also effective in multi-modal representation visualization, with the proposed method showing better performance in separating different emotion categories. The method is also effective in multi-modal feature fusion, with the proposed method showing better performance in emotion recognition compared to other methods. The method is also effective in multi-modal feature fusion, with the proposed method showing better performance in emotion recognition compared to other methods.
Reach us at info@study.space
[slides and audio] Revisiting Multi-modal Emotion Learning with Broad State Space Models and Probability-guidance Fusion