Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

April 2018 | Pete Warden
The Speech Commands dataset is designed for limited-vocabulary speech recognition, specifically for keyword spotting. It provides a standardized dataset for training and evaluating models that detect specific words from a small set of target words, with minimal false positives. The dataset includes 105,829 utterances of 35 words, stored as one-second WAVE files. It is released under the Creative Commons BY 4.0 license, making it accessible for research and development. The dataset includes background noise files to help train models to distinguish speech from non-speech. The dataset was collected using a web-based application that recorded utterances through phone or laptop microphones. It focuses on English and includes a limited vocabulary of 20 common words, with additional words in later versions. The data was processed to ensure quality, including removing short or quiet clips and extracting the loudest sections. Manual review was also conducted to filter out incorrect or unintelligible utterances. The dataset includes test files for evaluating model performance, with metrics such as Top-One error and Streaming Error Metrics. Version 2 of the dataset shows improved results compared to version 1. The dataset has been used in various applications, including training models for ARM microcontrollers and testing adversarial attacks on voice interfaces. The Speech Commands dataset aims to enable the development and comparison of models for on-device keyword spotting, providing a benchmark for accuracy and energy efficiency. It supports collaboration and progress in speech recognition technology by offering a standardized, publicly available dataset.The Speech Commands dataset is designed for limited-vocabulary speech recognition, specifically for keyword spotting. It provides a standardized dataset for training and evaluating models that detect specific words from a small set of target words, with minimal false positives. The dataset includes 105,829 utterances of 35 words, stored as one-second WAVE files. It is released under the Creative Commons BY 4.0 license, making it accessible for research and development. The dataset includes background noise files to help train models to distinguish speech from non-speech. The dataset was collected using a web-based application that recorded utterances through phone or laptop microphones. It focuses on English and includes a limited vocabulary of 20 common words, with additional words in later versions. The data was processed to ensure quality, including removing short or quiet clips and extracting the loudest sections. Manual review was also conducted to filter out incorrect or unintelligible utterances. The dataset includes test files for evaluating model performance, with metrics such as Top-One error and Streaming Error Metrics. Version 2 of the dataset shows improved results compared to version 1. The dataset has been used in various applications, including training models for ARM microcontrollers and testing adversarial attacks on voice interfaces. The Speech Commands dataset aims to enable the development and comparison of models for on-device keyword spotting, providing a benchmark for accuracy and energy efficiency. It supports collaboration and progress in speech recognition technology by offering a standardized, publicly available dataset.
Reach us at info@study.space