Understanding Robust Automatic Speech Recognition Using Wavelet-Based Adaptive Wavelet Thresholding%3A A Review

The paper "Robust Automatic Speech Recognition Using Wavelet-Based Adaptive Wavelet Thresholding: A Review" by Mahadevaswamy Shanthamallappa, Kiran Puttegowda, Naveen Kumar Hullahalli Nannappa, and Sudheesh Kannur Vasudeva Rao, discusses the challenges and advancements in automatic speech recognition (ASR) systems, particularly in noisy environments. The authors highlight that ASR systems perform best in closed environments with minimal background noise but struggle in environments with significant noise. They emphasize the impact of various factors such as vocabulary size, speech sound units, spoken environment, native language, transmission channel, speaker's emotional and health state, age, and speech corpus design on ASR performance. The paper introduces the history of ASR, the human speech production and perception process, ASR terminologies, and the framework of ASR. It also discusses the challenges associated with ASR design, focusing on speech corpus design and preprocessing. Traditional speech enhancement methods based on time and frequency domains are noted to be ineffective against nonstationary noise due to their fixed window duration. In contrast, wavelet transform-based soft thresholding techniques, which use variable window durations, are proposed as a powerful tool for preprocessing speech signals contaminated by additive Gaussian noise at various signal-to-noise ratios (SNRs). The proposed method involves decomposing speech signals into high- and low-frequency subbands, where most noise is present in the high-frequency bands. The wavelet-based Bayes shrink algorithm, combined with time-adaptive thresholds and soft thresholding, is applied to the high-frequency subbands to remove noise. The performance of this method is compared with state-of-the-art hard thresholding methods using the Kannada speech corpus and benchmarking TIMIT dataset. The results show that the proposed method performs better across various SNR levels. The paper is organized into sections covering the background, literature survey, dataset collection, experimental results, and conclusion. The background section delves into the production and perception of human speech, while the literature survey provides insights into related works. The dataset section highlights the creation and partitioning of the Kannada speech corpus, and the results and discussions section presents the experimental outcomes.The paper "Robust Automatic Speech Recognition Using Wavelet-Based Adaptive Wavelet Thresholding: A Review" by Mahadevaswamy Shanthamallappa, Kiran Puttegowda, Naveen Kumar Hullahalli Nannappa, and Sudheesh Kannur Vasudeva Rao, discusses the challenges and advancements in automatic speech recognition (ASR) systems, particularly in noisy environments. The authors highlight that ASR systems perform best in closed environments with minimal background noise but struggle in environments with significant noise. They emphasize the impact of various factors such as vocabulary size, speech sound units, spoken environment, native language, transmission channel, speaker's emotional and health state, age, and speech corpus design on ASR performance. The paper introduces the history of ASR, the human speech production and perception process, ASR terminologies, and the framework of ASR. It also discusses the challenges associated with ASR design, focusing on speech corpus design and preprocessing. Traditional speech enhancement methods based on time and frequency domains are noted to be ineffective against nonstationary noise due to their fixed window duration. In contrast, wavelet transform-based soft thresholding techniques, which use variable window durations, are proposed as a powerful tool for preprocessing speech signals contaminated by additive Gaussian noise at various signal-to-noise ratios (SNRs). The proposed method involves decomposing speech signals into high- and low-frequency subbands, where most noise is present in the high-frequency bands. The wavelet-based Bayes shrink algorithm, combined with time-adaptive thresholds and soft thresholding, is applied to the high-frequency subbands to remove noise. The performance of this method is compared with state-of-the-art hard thresholding methods using the Kannada speech corpus and benchmarking TIMIT dataset. The results show that the proposed method performs better across various SNR levels. The paper is organized into sections covering the background, literature survey, dataset collection, experimental results, and conclusion. The background section delves into the production and perception of human speech, while the literature survey provides insights into related works. The dataset section highlights the creation and partitioning of the Kannada speech corpus, and the results and discussions section presents the experimental outcomes.

Robust Automatic Speech Recognition Using Wavelet-Based Adaptive Wavelet Thresholding: A Review

01 February 2024 | Mahadevaswamy Shanthamallappa, Kiran Puttegowda, Naveen Kumar Hullahalli Nannappa, Sudheesh Kannur Vasudeva Rao