Understanding Revisiting Multimodal Emotion Recognition in Conversation from the Perspective of Graph Spectrum

This paper revisits the problem of multimodal emotion recognition in conversation (MERC) from the perspective of graph spectrum, addressing the limitations of existing methods that use Graph Neural Networks (GNNs). The proposed Graph-Spectrum-based Multimodal Consistency and Complementary collaborative learning framework (GS-MCC) aims to efficiently capture consistent and complementary semantic features in multimodal conversations. GS-MCC uses a sliding window to construct a multimodal interaction graph, Fourier graph operators to extract long-distance high and low-frequency information, and contrastive learning to promote the collaboration of these signals. Extensive experiments on the IEMOCAP and MELD datasets demonstrate the effectiveness and efficiency of the proposed method, showing superior performance compared to state-of-the-art baselines. The contributions of the work include an efficient long-distance information learning module, a high- and low-frequency information collaboration module, and extensive comparative and ablation experiments.This paper revisits the problem of multimodal emotion recognition in conversation (MERC) from the perspective of graph spectrum, addressing the limitations of existing methods that use Graph Neural Networks (GNNs). The proposed Graph-Spectrum-based Multimodal Consistency and Complementary collaborative learning framework (GS-MCC) aims to efficiently capture consistent and complementary semantic features in multimodal conversations. GS-MCC uses a sliding window to construct a multimodal interaction graph, Fourier graph operators to extract long-distance high and low-frequency information, and contrastive learning to promote the collaboration of these signals. Extensive experiments on the IEMOCAP and MELD datasets demonstrate the effectiveness and efficiency of the proposed method, showing superior performance compared to state-of-the-art baselines. The contributions of the work include an efficient long-distance information learning module, a high- and low-frequency information collaboration module, and extensive comparative and ablation experiments.

Revisiting Multimodal Emotion Recognition in Conversation from the Perspective of Graph Spectrum

3 May 2024 | Tao Meng, Fuchen Zhang, Yuntao Shou, Wei Ai, Nan Yin, Keqin Li