[slides] The INTERSPEECH 2009 emotion challenge

The INTERSPEECH 2009 Emotion Challenge, organized by Björn Schuller, Stefan Steidl, and Anton Batliner, aims to address the lack of standardized corpora and test conditions in emotion recognition from speech. The challenge uses the FAU AIBO Emotion Corpus, which includes speaker independence and different room acoustics, to bridge the gap between research and practical applications. The corpus consists of spontaneous, German speech recorded from children interacting with a Sony robot, Aibo, under various emotional conditions. The challenge includes three sub-challenges: 1. **Open Performance Sub-Challenge**: Participants can use their own features and classification algorithms but must adhere to the defined test and training sets. 2. **Classifier Sub-Challenge**: Participants can use a set of standard acoustic features provided by the organizers, which can be modified and combined for tuning. 3. **Feature Sub-Challenge**: Participants can upload their best individual features for evaluation. The paper discusses the motivation behind the challenge, the characteristics of the emotional speech data, the acoustic features used, and baseline results from two common architectures: dynamic modeling using Hidden Markov Models (HMM) and static modeling using statistical functional application. The baseline results highlight the challenges of transitioning from lab-based to real-world data, emphasizing the need for more realistic and comprehensive evaluations. The authors conclude that while the challenge does not aim for maximum performance, it demonstrates the difficulties in handling diverse and spontaneous data. They call for more such challenges and evaluations to improve the field of emotion recognition from speech.The INTERSPEECH 2009 Emotion Challenge, organized by Björn Schuller, Stefan Steidl, and Anton Batliner, aims to address the lack of standardized corpora and test conditions in emotion recognition from speech. The challenge uses the FAU AIBO Emotion Corpus, which includes speaker independence and different room acoustics, to bridge the gap between research and practical applications. The corpus consists of spontaneous, German speech recorded from children interacting with a Sony robot, Aibo, under various emotional conditions. The challenge includes three sub-challenges: 1. **Open Performance Sub-Challenge**: Participants can use their own features and classification algorithms but must adhere to the defined test and training sets. 2. **Classifier Sub-Challenge**: Participants can use a set of standard acoustic features provided by the organizers, which can be modified and combined for tuning. 3. **Feature Sub-Challenge**: Participants can upload their best individual features for evaluation. The paper discusses the motivation behind the challenge, the characteristics of the emotional speech data, the acoustic features used, and baseline results from two common architectures: dynamic modeling using Hidden Markov Models (HMM) and static modeling using statistical functional application. The baseline results highlight the challenges of transitioning from lab-based to real-world data, emphasizing the need for more realistic and comprehensive evaluations. The authors conclude that while the challenge does not aim for maximum performance, it demonstrates the difficulties in handling diverse and spontaneous data. They call for more such challenges and evaluations to improve the field of emotion recognition from speech.

The INTERSPEECH 2009 Emotion Challenge

6 – 10 September, Brighton UK | Björn Schuller1, Stefan Steidl2, Anton Batliner2