4 Jun 2019 | Soujanya Poria†, Devamanyu Hazarika‡, Navonil Majumder‡, Gautam Naik§, Erik Cambria§, Rada Mihalcea†
The MELD dataset is a large-scale multimodal multi-party dataset for emotion recognition in conversations. It extends and enhances the EmotionLines dataset, containing about 13,000 utterances from 1,433 dialogues from the TV series Friends. Each utterance is annotated with emotion and sentiment labels, and encompasses audio, visual, and textual modalities. The dataset is available for use at http://affective-meld.github.io.
The MELD dataset was developed to address the lack of large-scale multimodal conversational datasets for emotion recognition in conversations. It provides multimodal sources and can be used in a multimodal affective dialogue system for enhanced grounded learning. The dataset includes multi-party conversations that are more challenging to classify than dyadic variants available in previous datasets. MELD contains more than 13,000 utterances, which makes it nearly double the size of existing multimodal conversational datasets.
The MELD dataset was created by extracting the starting and ending timestamps of all utterances from every dialogue in the EmotionLines dataset. The dataset was then re-annotated by three annotators, with a majority voting scheme applied to select a final emotion label for each utterance. The overall Fleiss' kappa score of this annotation process was 0.43, which is higher than the original EmotionLines annotation whose kappa score was 0.34.
The MELD dataset includes visual, audio, and textual modalities for each utterance. It also provides strong baselines and experiments for emotion recognition in conversations. The dataset has been used to evaluate the performance of various models, including text-CNN, bcLSTM, and DialogueRNN. The results show that multimodal models outperform unimodal models in emotion recognition tasks.
The MELD dataset has several applications, including training emotion classifiers for use in generative dialogue systems, emotion and personality modeling of users in conversations, and training multimodal dialogue systems. The dataset is also useful for future research in the design of efficient multimodal fusion algorithms, novel ERC frameworks, and the extraction of new features from the audio, visual, and textual modalities.The MELD dataset is a large-scale multimodal multi-party dataset for emotion recognition in conversations. It extends and enhances the EmotionLines dataset, containing about 13,000 utterances from 1,433 dialogues from the TV series Friends. Each utterance is annotated with emotion and sentiment labels, and encompasses audio, visual, and textual modalities. The dataset is available for use at http://affective-meld.github.io.
The MELD dataset was developed to address the lack of large-scale multimodal conversational datasets for emotion recognition in conversations. It provides multimodal sources and can be used in a multimodal affective dialogue system for enhanced grounded learning. The dataset includes multi-party conversations that are more challenging to classify than dyadic variants available in previous datasets. MELD contains more than 13,000 utterances, which makes it nearly double the size of existing multimodal conversational datasets.
The MELD dataset was created by extracting the starting and ending timestamps of all utterances from every dialogue in the EmotionLines dataset. The dataset was then re-annotated by three annotators, with a majority voting scheme applied to select a final emotion label for each utterance. The overall Fleiss' kappa score of this annotation process was 0.43, which is higher than the original EmotionLines annotation whose kappa score was 0.34.
The MELD dataset includes visual, audio, and textual modalities for each utterance. It also provides strong baselines and experiments for emotion recognition in conversations. The dataset has been used to evaluate the performance of various models, including text-CNN, bcLSTM, and DialogueRNN. The results show that multimodal models outperform unimodal models in emotion recognition tasks.
The MELD dataset has several applications, including training emotion classifiers for use in generative dialogue systems, emotion and personality modeling of users in conversations, and training multimodal dialogue systems. The dataset is also useful for future research in the design of efficient multimodal fusion algorithms, novel ERC frameworks, and the extraction of new features from the audio, visual, and textual modalities.