31 Mar 2024 | Taeyang Yun, Hyunkuk Lim, Jeonghwan Lee, Min Song*
The paper introduces TelME, a novel multimodal emotion recognition (ERC) framework designed to enhance the effectiveness of non-verbal modalities in ERC tasks. TelME incorporates cross-modal knowledge distillation, where a language model acting as the teacher transfers information to non-verbal modalities, such as audio and visual features, to improve their contribution to emotion recognition. The framework also employs an attention-based modality shifting fusion approach to integrate the enhanced non-verbal features with the teacher's representation. Experiments on the MELD and IEMOCAP datasets demonstrate that TelME achieves state-of-the-art performance, particularly in multi-party conversational scenarios. The method outperforms existing ERC methods, showing superior results in both datasets, especially on MELD, where it improves upon the previous state-of-the-art by 0.66%. The paper also includes an ablation study and a detailed analysis of the impact of each modality, highlighting the effectiveness of the proposed techniques.The paper introduces TelME, a novel multimodal emotion recognition (ERC) framework designed to enhance the effectiveness of non-verbal modalities in ERC tasks. TelME incorporates cross-modal knowledge distillation, where a language model acting as the teacher transfers information to non-verbal modalities, such as audio and visual features, to improve their contribution to emotion recognition. The framework also employs an attention-based modality shifting fusion approach to integrate the enhanced non-verbal features with the teacher's representation. Experiments on the MELD and IEMOCAP datasets demonstrate that TelME achieves state-of-the-art performance, particularly in multi-party conversational scenarios. The method outperforms existing ERC methods, showing superior results in both datasets, especially on MELD, where it improves upon the previous state-of-the-art by 0.66%. The paper also includes an ablation study and a detailed analysis of the impact of each modality, highlighting the effectiveness of the proposed techniques.