[slides] Speech emotion recognition via graph-based representations

This paper proposes a novel approach to speech emotion recognition (SER) using graph theory. The authors leverage the structural and statistical information from time series data to extract features that can be used for emotion classification. They introduce a unique feature set for each emotion per speaker and employ a Random Forest classifier in a Leave-One-Speaker-Out Cross Validation (LOSO-CV) scheme. The method is compared with state-of-the-art approaches involving hand-crafted features and deep learning architectures on three datasets: EMODB (German, acted), AESDD (Greek, acted), and DEMoS (Italian, in-the-wild). The results show that the proposed method outperforms the comparative methods, achieving an average UAR increase of 18%, 8%, and 13% respectively. The key contributions include the extraction of graph-based features from structural and statistical information, the use of speaker-based motif classification, and the effectiveness of the proposed method in various datasets. The paper also discusses the limitations and future directions of the research.This paper proposes a novel approach to speech emotion recognition (SER) using graph theory. The authors leverage the structural and statistical information from time series data to extract features that can be used for emotion classification. They introduce a unique feature set for each emotion per speaker and employ a Random Forest classifier in a Leave-One-Speaker-Out Cross Validation (LOSO-CV) scheme. The method is compared with state-of-the-art approaches involving hand-crafted features and deep learning architectures on three datasets: EMODB (German, acted), AESDD (Greek, acted), and DEMoS (Italian, in-the-wild). The results show that the proposed method outperforms the comparative methods, achieving an average UAR increase of 18%, 8%, and 13% respectively. The key contributions include the extraction of graph-based features from structural and statistical information, the use of speaker-based motif classification, and the effectiveness of the proposed method in various datasets. The paper also discusses the limitations and future directions of the research.

Speech emotion recognition via graph-based representations

2024 | Anastasia Pentari, George Kafentzis, Manolis Tsiknakis