SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

3 Dec 2019 | Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le
SpecAugment is a simple and effective data augmentation method for speech recognition, applied directly to the feature inputs of neural networks. It consists of three main augmentations: time warping, frequency masking, and time masking. The method is designed to make the network robust to deformations in the time direction, partial loss of frequency information, and partial loss of small segments of speech. SpecAugment is applied to Listen, Attend and Spell (LAS) networks for end-to-end speech recognition tasks. The authors achieve state-of-the-art performance on the LibriSpeech 960h and Switchboard 300h datasets, outperforming previous methods even without the use of language models. The results highlight the effectiveness of SpecAugment in improving the robustness and performance of ASR systems.SpecAugment is a simple and effective data augmentation method for speech recognition, applied directly to the feature inputs of neural networks. It consists of three main augmentations: time warping, frequency masking, and time masking. The method is designed to make the network robust to deformations in the time direction, partial loss of frequency information, and partial loss of small segments of speech. SpecAugment is applied to Listen, Attend and Spell (LAS) networks for end-to-end speech recognition tasks. The authors achieve state-of-the-art performance on the LibriSpeech 960h and Switchboard 300h datasets, outperforming previous methods even without the use of language models. The results highlight the effectiveness of SpecAugment in improving the robustness and performance of ASR systems.
Reach us at info@study.space