[slides and audio] Opensmile%3A the munich versatile and fast open-source audio feature extractor

openSMILE is an open-source audio feature extractor developed by the Institute for Human-Machine Communication at Technische Universität München. It combines feature extraction algorithms from speech processing and Music Information Retrieval (MIR) communities. It supports a wide range of audio descriptors, including CHROMA, CENS, loudness, Mel-frequency cepstral coefficients (MFCC), perceptual linear predictive cepstral coefficients (PLP), linear predictive coefficients, line spectral frequencies, fundamental frequency, and formant frequencies. Delta regression and various statistical functionals can be applied to these descriptors. openSMILE is implemented in C++ with no third-party dependencies for core functionality. It is fast, runs on Unix and Windows platforms, and has a modular, component-based architecture that allows easy extension via plugins. It supports both online incremental processing and offline/batch processing. Unit tests ensure numeric compatibility with future versions. openSMILE can be downloaded from http://opensmile.sourceforge.net/. openSMILE is designed to be modality-independent, allowing the analysis of physiological features like heart rate, EEG, or EMG signals using audio processing algorithms. It provides a simple, scriptable console application where modular feature extraction components can be configured and connected via a single configuration file. This allows for efficient and customizable feature extraction without duplicating computations. openSMILE is compatible with research toolkits such as HTK, WEKA, and LibSVM by supporting their data formats. It is already successfully used by researchers worldwide for tasks such as emotion recognition and speech processing. openSMILE was the official feature extractor for the INTERSPEECH 2009 Emotion Challenge and the ongoing INTERSPEECH 2010 Paralinguistic Challenge. It is actively developed, with new features such as TEAGER energy, TOBI pitch descriptors, and psychoacoustic measures like Sharpness and Roughness being considered for integration. It will soon support MPEG-7 LLD XML output and is expected to link with openCV for fusing visual and acoustic features. Future work will focus on improved multithreading support and cooperation with related projects to ensure coverage of a broad variety of typically employed features in one piece of fast, lightweight, flexible open-source software.openSMILE is an open-source audio feature extractor developed by the Institute for Human-Machine Communication at Technische Universität München. It combines feature extraction algorithms from speech processing and Music Information Retrieval (MIR) communities. It supports a wide range of audio descriptors, including CHROMA, CENS, loudness, Mel-frequency cepstral coefficients (MFCC), perceptual linear predictive cepstral coefficients (PLP), linear predictive coefficients, line spectral frequencies, fundamental frequency, and formant frequencies. Delta regression and various statistical functionals can be applied to these descriptors. openSMILE is implemented in C++ with no third-party dependencies for core functionality. It is fast, runs on Unix and Windows platforms, and has a modular, component-based architecture that allows easy extension via plugins. It supports both online incremental processing and offline/batch processing. Unit tests ensure numeric compatibility with future versions. openSMILE can be downloaded from http://opensmile.sourceforge.net/. openSMILE is designed to be modality-independent, allowing the analysis of physiological features like heart rate, EEG, or EMG signals using audio processing algorithms. It provides a simple, scriptable console application where modular feature extraction components can be configured and connected via a single configuration file. This allows for efficient and customizable feature extraction without duplicating computations. openSMILE is compatible with research toolkits such as HTK, WEKA, and LibSVM by supporting their data formats. It is already successfully used by researchers worldwide for tasks such as emotion recognition and speech processing. openSMILE was the official feature extractor for the INTERSPEECH 2009 Emotion Challenge and the ongoing INTERSPEECH 2010 Paralinguistic Challenge. It is actively developed, with new features such as TEAGER energy, TOBI pitch descriptors, and psychoacoustic measures like Sharpness and Roughness being considered for integration. It will soon support MPEG-7 LLD XML output and is expected to link with openCV for fusing visual and acoustic features. Future work will focus on improved multithreading support and cooperation with related projects to ensure coverage of a broad variety of typically employed features in one piece of fast, lightweight, flexible open-source software.

openSMILE – The Munich Versatile and Fast Open-Source Audio Feature Extractor

2010 | Florian Eyben, Martin Wöllmer, Björn Schuller