[slides] Tensor Fusion Network for Multimodal Sentiment Analysis

This paper addresses the challenge of multimodal sentiment analysis, which extends traditional language-based sentiment analysis to include visual and acoustic modalities. The authors propose a novel model called Tensor Fusion Network (TFN) that learns both intra-modality and inter-modality dynamics end-to-end. TFN is designed to handle the volatile nature of spoken language in online videos, incorporating accompanying gestures and voice. The model outperforms state-of-the-art approaches for both multimodal and unimodal sentiment analysis in experiments using the CMU-MOSI dataset. The paper also includes a detailed evaluation of the TFN's components, demonstrating the importance of each modality and the effectiveness of the proposed Tensor Fusion approach.This paper addresses the challenge of multimodal sentiment analysis, which extends traditional language-based sentiment analysis to include visual and acoustic modalities. The authors propose a novel model called Tensor Fusion Network (TFN) that learns both intra-modality and inter-modality dynamics end-to-end. TFN is designed to handle the volatile nature of spoken language in online videos, incorporating accompanying gestures and voice. The model outperforms state-of-the-art approaches for both multimodal and unimodal sentiment analysis in experiments using the CMU-MOSI dataset. The paper also includes a detailed evaluation of the TFN's components, demonstrating the importance of each modality and the effectiveness of the proposed Tensor Fusion approach.

Tensor Fusion Network for Multimodal Sentiment Analysis

23 Jul 2017 | Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, Louis-Philippe Morency