Understanding Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

This paper presents a deep convolutional neural network (CNN) architecture for environmental sound classification, leveraging data augmentation to address the scarcity of labeled data. The proposed CNN architecture consists of three convolutional layers, two pooling operations, and two fully connected layers, designed to capture spectro-temporal patterns effectively. The authors experiment with four types of audio data augmentations—time stretching, pitch shifting, dynamic range compression, and background noise—each applied to the UrbanSound8K dataset. The results show that the combination of the proposed CNN architecture and data augmentation significantly improves classification accuracy compared to both a shallow dictionary learning model and a CNN without augmentation. The study also examines the impact of each augmentation on the model's performance for different sound classes, suggesting that class-conditional data augmentation could further enhance the model's performance.This paper presents a deep convolutional neural network (CNN) architecture for environmental sound classification, leveraging data augmentation to address the scarcity of labeled data. The proposed CNN architecture consists of three convolutional layers, two pooling operations, and two fully connected layers, designed to capture spectro-temporal patterns effectively. The authors experiment with four types of audio data augmentations—time stretching, pitch shifting, dynamic range compression, and background noise—each applied to the UrbanSound8K dataset. The results show that the combination of the proposed CNN architecture and data augmentation significantly improves classification accuracy compared to both a shallow dictionary learning model and a CNN without augmentation. The study also examines the impact of each augmentation on the model's performance for different sound classes, suggesting that class-conditional data augmentation could further enhance the model's performance.

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

ACCEPTED NOVEMBER 2016 | Justin Salamon and Juan Pablo Bello