MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition

MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition

21 Apr 2024 | Zheng Lian, Licai Sun, Yong Ren, Hao Gu, Haiyang Sun, Lan Chen, Bin Liu, Jianhua Tao
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition This paper introduces MERBench, a unified evaluation benchmark for multimodal emotion recognition. The benchmark aims to provide a comprehensive evaluation framework to compare different methods in this field. It includes primary datasets, features, and multimodal fusion strategies. Under the same experimental setup, we explore key problems such as feature selection, multimodal fusion strategies, cross-corpus performance, noise robustness, and the impact of missing punctuation. The paper also introduces a new Chinese emotion dataset, MER2023, which is designed to serve as a benchmark for research on multi-label learning, noise robustness, and semi-supervised learning. The dataset contains three subsets: a multi-label subset for studying discrete and dimension label correlations, a noisy subset for evaluating noise robustness, and an unlabeled subset for studying semi-supervised learning. The main contribution of this paper is the development of MERBench, which is the most comprehensive benchmark in this field, covering feature selection, multimodal fusion, cross-corpus performance, robustness analysis, and language sensitivity analysis. The paper also proposes a new Chinese emotion dataset, MER2023, which is designed to serve as a benchmark for research on multi-label learning, noise robustness, and semi-supervised learning. The dataset contains three subsets: a multi-label subset for studying discrete and dimension label correlations, a noisy subset for evaluating noise robustness, and an unlabeled subset for studying semi-supervised learning. The paper also evaluates different unimodal and multimodal features and presents results for various datasets. The results show that multimodal fusion consistently improves performance, and that the attention mechanism achieves the best overall results. The paper also evaluates the impact of feature number in multimodal fusion and finds that increasing the number of features generally enhances the results. The paper also evaluates the impact of language matching on acoustic and lexical encoders and finds that language-matching encoders consistently achieve better performance. The paper also evaluates the impact of cross-corpus performance and finds that good within-corpus features generally lead to good cross-corpus results. The paper also evaluates the impact of different fusion algorithms and finds that the attention mechanism achieves the best overall results. The paper also evaluates the impact of different feature sets and finds that the high-performance set achieves better results. The paper also evaluates the impact of different datasets and finds that different datasets convey emotions in distinct ways. The paper also evaluates the impact of different fusion strategies and finds that the attention mechanism achieves the best overall results. The paper also evaluates the impact of different feature numbers and finds that increasing the number of features generally enhances the results. The paper also evaluates the impact of different feature sets and finds that the high-performance set achieves better results. The paper also evaluates the impact of different datasets and finds that different datasets convey emotions in distinct ways. The paper also evaluates the impact of different fusion strategies and finds that the attention mechanism achieves the bestMERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition This paper introduces MERBench, a unified evaluation benchmark for multimodal emotion recognition. The benchmark aims to provide a comprehensive evaluation framework to compare different methods in this field. It includes primary datasets, features, and multimodal fusion strategies. Under the same experimental setup, we explore key problems such as feature selection, multimodal fusion strategies, cross-corpus performance, noise robustness, and the impact of missing punctuation. The paper also introduces a new Chinese emotion dataset, MER2023, which is designed to serve as a benchmark for research on multi-label learning, noise robustness, and semi-supervised learning. The dataset contains three subsets: a multi-label subset for studying discrete and dimension label correlations, a noisy subset for evaluating noise robustness, and an unlabeled subset for studying semi-supervised learning. The main contribution of this paper is the development of MERBench, which is the most comprehensive benchmark in this field, covering feature selection, multimodal fusion, cross-corpus performance, robustness analysis, and language sensitivity analysis. The paper also proposes a new Chinese emotion dataset, MER2023, which is designed to serve as a benchmark for research on multi-label learning, noise robustness, and semi-supervised learning. The dataset contains three subsets: a multi-label subset for studying discrete and dimension label correlations, a noisy subset for evaluating noise robustness, and an unlabeled subset for studying semi-supervised learning. The paper also evaluates different unimodal and multimodal features and presents results for various datasets. The results show that multimodal fusion consistently improves performance, and that the attention mechanism achieves the best overall results. The paper also evaluates the impact of feature number in multimodal fusion and finds that increasing the number of features generally enhances the results. The paper also evaluates the impact of language matching on acoustic and lexical encoders and finds that language-matching encoders consistently achieve better performance. The paper also evaluates the impact of cross-corpus performance and finds that good within-corpus features generally lead to good cross-corpus results. The paper also evaluates the impact of different fusion algorithms and finds that the attention mechanism achieves the best overall results. The paper also evaluates the impact of different feature sets and finds that the high-performance set achieves better results. The paper also evaluates the impact of different datasets and finds that different datasets convey emotions in distinct ways. The paper also evaluates the impact of different fusion strategies and finds that the attention mechanism achieves the best overall results. The paper also evaluates the impact of different feature numbers and finds that increasing the number of features generally enhances the results. The paper also evaluates the impact of different feature sets and finds that the high-performance set achieves better results. The paper also evaluates the impact of different datasets and finds that different datasets convey emotions in distinct ways. The paper also evaluates the impact of different fusion strategies and finds that the attention mechanism achieves the best
Reach us at info@study.space
[slides and audio] MERBench%3A A Unified Evaluation Benchmark for Multimodal Emotion Recognition