Content-Based Classification, Search, and Retrieval of Audio

Content-Based Classification, Search, and Retrieval of Audio

Fall 1996 | Erling Wold, Thom Blum, Douglas Keislar, and James Wheaton
This paper presents a content-based audio classification, search, and retrieval system that reduces audio to perceptual and acoustical features, enabling users to search or retrieve sounds by their characteristics. The system allows users to search by a single feature, a combination of features, or by selecting reference sounds to find similar or dissimilar sounds. The system is designed to handle the increasing volume of audio data in modern applications, where audio is often treated as an opaque collection of bytes. The system uses statistical techniques to analyze and classify audio based on features such as loudness, pitch, brightness, bandwidth, and harmonicity. These features are extracted from the audio signal and stored as a vector, which is then used for classification and retrieval. The system can be trained by example, allowing users to define classes based on specific acoustic properties. The system also supports fuzzy queries, enabling users to search for sounds that are similar to a given sound or have certain characteristics. The system is capable of handling a wide range of audio data, including sounds from animals, machines, musical instruments, speech, and nature. It has been tested on a database of 400 sound files, demonstrating its effectiveness in classifying and retrieving sounds based on their acoustic features. The system also supports segmentation of complex audio recordings into individual sounds, allowing for more precise classification and retrieval. The system has been implemented in a software application called SoundFisher, which provides a user-friendly interface for searching and managing audio data. The application allows users to perform content-based searches, set up custom classes, and refine their queries using a combination of constraints and fuzzy logic. The system is also applicable to a variety of other areas, including audio editors, surveillance, and automatic segmentation of audio and video. The paper also discusses the challenges and limitations of current audio retrieval systems, including the difficulty of separating simultaneous sound sources and the need for more advanced features to handle complex audio data. The authors conclude that their approach provides a robust and flexible framework for content-based audio retrieval, with the potential to be applied to a wide range of audio applications.This paper presents a content-based audio classification, search, and retrieval system that reduces audio to perceptual and acoustical features, enabling users to search or retrieve sounds by their characteristics. The system allows users to search by a single feature, a combination of features, or by selecting reference sounds to find similar or dissimilar sounds. The system is designed to handle the increasing volume of audio data in modern applications, where audio is often treated as an opaque collection of bytes. The system uses statistical techniques to analyze and classify audio based on features such as loudness, pitch, brightness, bandwidth, and harmonicity. These features are extracted from the audio signal and stored as a vector, which is then used for classification and retrieval. The system can be trained by example, allowing users to define classes based on specific acoustic properties. The system also supports fuzzy queries, enabling users to search for sounds that are similar to a given sound or have certain characteristics. The system is capable of handling a wide range of audio data, including sounds from animals, machines, musical instruments, speech, and nature. It has been tested on a database of 400 sound files, demonstrating its effectiveness in classifying and retrieving sounds based on their acoustic features. The system also supports segmentation of complex audio recordings into individual sounds, allowing for more precise classification and retrieval. The system has been implemented in a software application called SoundFisher, which provides a user-friendly interface for searching and managing audio data. The application allows users to perform content-based searches, set up custom classes, and refine their queries using a combination of constraints and fuzzy logic. The system is also applicable to a variety of other areas, including audio editors, surveillance, and automatic segmentation of audio and video. The paper also discusses the challenges and limitations of current audio retrieval systems, including the difficulty of separating simultaneous sound sources and the need for more advanced features to handle complex audio data. The authors conclude that their approach provides a robust and flexible framework for content-based audio retrieval, with the potential to be applied to a wide range of audio applications.
Reach us at info@study.space
Understanding Content-Based Classification%2C Search%2C and Retrieval of Audio