2013 | Florian Eyben, Felix Weninger, Florian Groß, Björn Schuller
The paper presents recent developments in openSMILE, an open-source multimedia feature extractor. Version 2.0 integrates audio, music, and video features for multi-modal processing, allowing joint processing of audio and video descriptors with time synchronization, online incremental processing, and batch processing. It supports statistical functionals like moments, peaks, and regression parameters. Post-processing includes classifiers like SVM and export to toolkits such as Weka and HTK. Low-level descriptors include speech, music, and video features like MFCCs, Chroma, CENS, and optical flow histograms. It also supports voice activity detection, pitch tracking, and face detection. OpenSMILE is implemented in C++ and runs on Unix and Windows platforms. It has a modular architecture, making it easy to extend via plugins. OpenSMILE 2.0 is distributed under a research license and can be downloaded from http://opensmile.sourceforge.net/.
The paper discusses the design and functionality of openSMILE, emphasizing its ability to handle real-time, incremental processing and support for various feature extraction tasks. It describes the architecture, including the data memory, which links data sources, processors, and sinks. The system supports a wide range of features, including audio and video descriptors, statistical functionals, and export to various formats. It also includes a multi-loop processing mode and context-sensitive recurrent neural networks.
The paper presents case studies and benchmarks, including paralinguistic information extraction, speaker characterization in web videos, and violence detection. These examples demonstrate the versatility of openSMILE in multimedia recognition tasks. The system has been used in several challenges and has shown state-of-the-art results in emotion, age, gender, and other recognition tasks. Future developments include integrating audio and video input, implementing online audio enhancement algorithms, and adding a TCP/IP network interface for real-time interaction with distributed systems. OpenSMILE has become a standard reference toolkit in computational paralinguistics and is expected to be widely adopted by other communities.The paper presents recent developments in openSMILE, an open-source multimedia feature extractor. Version 2.0 integrates audio, music, and video features for multi-modal processing, allowing joint processing of audio and video descriptors with time synchronization, online incremental processing, and batch processing. It supports statistical functionals like moments, peaks, and regression parameters. Post-processing includes classifiers like SVM and export to toolkits such as Weka and HTK. Low-level descriptors include speech, music, and video features like MFCCs, Chroma, CENS, and optical flow histograms. It also supports voice activity detection, pitch tracking, and face detection. OpenSMILE is implemented in C++ and runs on Unix and Windows platforms. It has a modular architecture, making it easy to extend via plugins. OpenSMILE 2.0 is distributed under a research license and can be downloaded from http://opensmile.sourceforge.net/.
The paper discusses the design and functionality of openSMILE, emphasizing its ability to handle real-time, incremental processing and support for various feature extraction tasks. It describes the architecture, including the data memory, which links data sources, processors, and sinks. The system supports a wide range of features, including audio and video descriptors, statistical functionals, and export to various formats. It also includes a multi-loop processing mode and context-sensitive recurrent neural networks.
The paper presents case studies and benchmarks, including paralinguistic information extraction, speaker characterization in web videos, and violence detection. These examples demonstrate the versatility of openSMILE in multimedia recognition tasks. The system has been used in several challenges and has shown state-of-the-art results in emotion, age, gender, and other recognition tasks. Future developments include integrating audio and video input, implementing online audio enhancement algorithms, and adding a TCP/IP network interface for real-time interaction with distributed systems. OpenSMILE has become a standard reference toolkit in computational paralinguistics and is expected to be widely adopted by other communities.