4 Jun 2019 | Soujanya Poria†, Devamanyu Hazarika‡, Navonil Majumder‡, Gautam Naik§, Erik Cambria§, Rada Mihalcea†
The paper introduces MELD, a multimodal multi-party conversational emotion recognition dataset. MELD is an extension and enhancement of the EmotionLines dataset, containing about 13,000 utterances from 1,433 dialogues in the TV series *Friends*. Each utterance is annotated with emotion and sentiment labels and includes audio, visual, and textual modalities. The authors propose several strong multimodal baselines and demonstrate the importance of contextual and multimodal information for emotion recognition in conversations. The dataset is available for public use, and the paper discusses its contributions, dataset construction, feature extraction, baseline models, experimental results, and future directions. The results show that multimodal fusion improves performance, particularly in sentiment classification, while emotion classification remains challenging due to the fine-grained nature of the task. The paper also highlights the role of context and inter-speaker influence in emotion recognition and suggests future research areas, including improved contextual modeling and the use of visual information.The paper introduces MELD, a multimodal multi-party conversational emotion recognition dataset. MELD is an extension and enhancement of the EmotionLines dataset, containing about 13,000 utterances from 1,433 dialogues in the TV series *Friends*. Each utterance is annotated with emotion and sentiment labels and includes audio, visual, and textual modalities. The authors propose several strong multimodal baselines and demonstrate the importance of contextual and multimodal information for emotion recognition in conversations. The dataset is available for public use, and the paper discusses its contributions, dataset construction, feature extraction, baseline models, experimental results, and future directions. The results show that multimodal fusion improves performance, particularly in sentiment classification, while emotion classification remains challenging due to the fine-grained nature of the task. The paper also highlights the role of context and inter-speaker influence in emotion recognition and suggests future research areas, including improved contextual modeling and the use of visual information.