November 3-7, 2014, Orlando, Florida, USA | Justin Salamon, Christopher Jacoby, Juan Pablo Bello
This paper addresses the challenges in automatic urban sound classification, which is crucial for multimedia retrieval and urban informatics. The main issues identified are the lack of a common taxonomy and insufficient large, real-world, annotated data. To tackle these, the authors propose a taxonomy of urban sounds and introduce *UrbanSound*, a dataset containing 27 hours of audio with 18.5 hours of annotated sound events across 10 sound classes. The taxonomy is designed to be detailed and relevant to urban noise pollution, focusing on specific sound sources like car horns and jackhammers. The dataset, collected from Freesound, includes real-world field recordings and is the largest free dataset of labeled urban sound events available. The paper also presents *UrbanSound8K*, a subset of short audio snippets for sound source identification. Through baseline classification experiments, the authors study the dataset's challenges, including sensitivity to temporal scale, confusion due to timbre similarity, and background interference. The findings highlight the need for multi-scale analysis and better modeling of temporal dynamics in future research. The dataset is expected to advance research in urban sound analysis and multimedia applications.This paper addresses the challenges in automatic urban sound classification, which is crucial for multimedia retrieval and urban informatics. The main issues identified are the lack of a common taxonomy and insufficient large, real-world, annotated data. To tackle these, the authors propose a taxonomy of urban sounds and introduce *UrbanSound*, a dataset containing 27 hours of audio with 18.5 hours of annotated sound events across 10 sound classes. The taxonomy is designed to be detailed and relevant to urban noise pollution, focusing on specific sound sources like car horns and jackhammers. The dataset, collected from Freesound, includes real-world field recordings and is the largest free dataset of labeled urban sound events available. The paper also presents *UrbanSound8K*, a subset of short audio snippets for sound source identification. Through baseline classification experiments, the authors study the dataset's challenges, including sensitivity to temporal scale, confusion due to timbre similarity, and background interference. The findings highlight the need for multi-scale analysis and better modeling of temporal dynamics in future research. The dataset is expected to advance research in urban sound analysis and multimedia applications.