The INTERSPEECH 2009 Emotion Challenge

The INTERSPEECH 2009 Emotion Challenge

2009 | Björn Schuller¹, Stefan Steidl², Anton Batliner²
The INTERSPEECH 2009 Emotion Challenge aimed to address the lack of standardized corpora and test conditions for emotion recognition from speech, which hinders reproducibility and comparability of results. The FAU Aibo Emotion Corpus was used as the basis, providing spontaneous, emotionally colored German speech from children interacting with a robot. The corpus includes 48,401 words, with five and two-class classification tasks. The five-class problem includes Anger, Emphatic, Neutral, Positive, and Rest, while the two-class problem includes Negative and Idle. The challenge included three sub-challenges: Open Performance, Classifier, and Feature. The Classifier Sub-Challenge used standard acoustic features like pitch, energy, and MFCC, with delta coefficients and statistical functionals. The Feature Sub-Challenge encouraged participants to submit their own features. Baseline results showed that using dynamic modeling with HMMs and static modeling with WEKA achieved accuracy of 70.1% for the two-class problem and 65.1% for the five-class problem. The challenge emphasized the importance of realistic, spontaneous data and highlighted the need for standardized evaluation methods. The results demonstrated the difficulty of transitioning from controlled lab settings to real-world applications. The challenge aimed to promote more realistic and comparable evaluations in emotion recognition research.The INTERSPEECH 2009 Emotion Challenge aimed to address the lack of standardized corpora and test conditions for emotion recognition from speech, which hinders reproducibility and comparability of results. The FAU Aibo Emotion Corpus was used as the basis, providing spontaneous, emotionally colored German speech from children interacting with a robot. The corpus includes 48,401 words, with five and two-class classification tasks. The five-class problem includes Anger, Emphatic, Neutral, Positive, and Rest, while the two-class problem includes Negative and Idle. The challenge included three sub-challenges: Open Performance, Classifier, and Feature. The Classifier Sub-Challenge used standard acoustic features like pitch, energy, and MFCC, with delta coefficients and statistical functionals. The Feature Sub-Challenge encouraged participants to submit their own features. Baseline results showed that using dynamic modeling with HMMs and static modeling with WEKA achieved accuracy of 70.1% for the two-class problem and 65.1% for the five-class problem. The challenge emphasized the importance of realistic, spontaneous data and highlighted the need for standardized evaluation methods. The results demonstrated the difficulty of transitioning from controlled lab settings to real-world applications. The challenge aimed to promote more realistic and comparable evaluations in emotion recognition research.
Reach us at info@study.space
[slides and audio] The INTERSPEECH 2009 emotion challenge