WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge

WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge

20 Feb 2024 | Wenbin Wang, Liang Ding, Li Shen, Yong Luo, Han Hu, Dacheng Tao
WisdoM is a plug-in framework that enhances multimodal sentiment analysis (MSA) by integrating contextual world knowledge from large vision-language models (LVLMs). The framework consists of three stages: prompt templates generation, context generation, and contextual fusion. In the first stage, prompt templates are generated using language models like ChatGPT to guide LVLMs in producing relevant contextual information. In the second stage, LVLMs generate context based on the provided image and sentence. In the third stage, a training-free contextual fusion mechanism is used to reduce noise in the context and improve sentiment classification accuracy. Experiments on benchmark datasets such as Twitter2015, Twitter2017, and MSED show that WisdoM significantly improves performance, achieving an average +1.96% F1 score improvement over existing methods. The framework effectively incorporates contextual world knowledge to enhance the ability of MSA models, particularly for tasks involving ambiguous or complex sentiment contexts. WisdoM demonstrates its effectiveness across various models and tasks, showing that contextual world knowledge is crucial for accurate sentiment analysis. The method is designed to be adaptable to different architectures and modalities, and it outperforms existing approaches in terms of accuracy and robustness.WisdoM is a plug-in framework that enhances multimodal sentiment analysis (MSA) by integrating contextual world knowledge from large vision-language models (LVLMs). The framework consists of three stages: prompt templates generation, context generation, and contextual fusion. In the first stage, prompt templates are generated using language models like ChatGPT to guide LVLMs in producing relevant contextual information. In the second stage, LVLMs generate context based on the provided image and sentence. In the third stage, a training-free contextual fusion mechanism is used to reduce noise in the context and improve sentiment classification accuracy. Experiments on benchmark datasets such as Twitter2015, Twitter2017, and MSED show that WisdoM significantly improves performance, achieving an average +1.96% F1 score improvement over existing methods. The framework effectively incorporates contextual world knowledge to enhance the ability of MSA models, particularly for tasks involving ambiguous or complex sentiment contexts. WisdoM demonstrates its effectiveness across various models and tasks, showing that contextual world knowledge is crucial for accurate sentiment analysis. The method is designed to be adaptable to different architectures and modalities, and it outperforms existing approaches in terms of accuracy and robustness.
Reach us at info@study.space
Understanding WisdoM%3A Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge