Understanding Multimodal Language Analysis in the Wild%3A CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

This paper introduces the CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) dataset, the largest dataset for multimodal sentiment analysis and emotion recognition to date. The dataset contains 23,453 annotated video segments from 1,000 distinct speakers and 250 topics, with manual transcriptions aligned to the audio at the phoneme level. The paper also presents a novel interpretable fusion technique called the Dynamic Fusion Graph (DFG), which is used to investigate how modalities interact in human multimodal language. DFG is highly interpretable and achieves competitive performance compared to current state-of-the-art models. The authors analyze the efficacy of DFG over time, revealing insights into the dynamics of multimodal fusion. The results show that DFG can selectively prioritize certain modalities and learn priors about human communication, demonstrating its effectiveness in managing internal structure to model human communication.This paper introduces the CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) dataset, the largest dataset for multimodal sentiment analysis and emotion recognition to date. The dataset contains 23,453 annotated video segments from 1,000 distinct speakers and 250 topics, with manual transcriptions aligned to the audio at the phoneme level. The paper also presents a novel interpretable fusion technique called the Dynamic Fusion Graph (DFG), which is used to investigate how modalities interact in human multimodal language. DFG is highly interpretable and achieves competitive performance compared to current state-of-the-art models. The authors analyze the efficacy of DFG over time, revealing insights into the dynamics of multimodal fusion. The results show that DFG can selectively prioritize certain modalities and learn priors about human communication, demonstrating its effectiveness in managing internal structure to model human communication.

Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

Melbourne, Australia, July 15 - 20, 2018 | Amir Zadeh, Paul Pu Liang, Jonathan Vanbriesen, Soujanya Poria, Edmund Tong, Erik Cambria, Minghai Chen, Louis-Philippe Morency