Automatic speech recognition (ASR) is a critical research area with significant challenges, especially in noisy environments. ASR systems perform well in closed environments with minimal background noise but struggle in environments with diverse noises. Factors such as vocabulary size, speech units, environment, language, transmission channels, speaker state, and preprocessing significantly affect ASR performance. Noise in speech signals is a major challenge, as it degrades system performance. This paper reviews ASR, its history, speech production and perception, terminology, framework, and design challenges, focusing on speech corpus design and preprocessing. Traditional time and frequency domain methods are inadequate for nonstationary noise due to fixed window durations. Wavelet-based soft thresholding techniques, however, can handle nonstationary noise with variable window durations, making them effective for preprocessing speech signals contaminated with additive Gaussian noise. Speech signals are decomposed into high- and low-frequency subbands, with most noise present in high-frequency bands. A wavelet-based Bayes shrink algorithm combined with time-adaptive thresholds and soft thresholding is applied to high-frequency subbands to remove noise. The performance of soft thresholding is compared with hard thresholding. Experiments on the Kannada speech corpus and TIMIT dataset show that the proposed method performs better across various SNR levels. Keywords: Automatic speech recognition, Wavelet packet transform, Deep neural networks, Speech enhancement. ASR is a technology converting acoustic speech into text. It has been a research area for over 60 years, with significant progress in recent decades. ASR systems have evolved from handling small vocabularies to large ones. Human speech recognition is highly stable, even in adverse conditions, while machines lag. ASR systems are crucial for communication, especially in multilingual environments. Despite advancements, ideal speech recognition remains unachieved due to factors like channel, noise, dialects, accents, age, and speaker conditions. Historical research in speech processing began with Kratzenstein's work in 1769 and continued with Dudley's efforts in the 1930s. Speech recognition is now widely applied globally, with many consumer products using voice commands. ASR systems perform well in closed environments but degrade in noisy, open settings. Wavelet transforms are effective for speech denoising, so this paper proposes a wavelet-based preprocessing system to improve ASR performance. A Kannada speech corpus of 10 hours is created, labeled, and partitioned. A custom wavelet packet transform is applied to speech signals. The paper is organized into sections on background, literature survey, dataset, results, and conclusion.Automatic speech recognition (ASR) is a critical research area with significant challenges, especially in noisy environments. ASR systems perform well in closed environments with minimal background noise but struggle in environments with diverse noises. Factors such as vocabulary size, speech units, environment, language, transmission channels, speaker state, and preprocessing significantly affect ASR performance. Noise in speech signals is a major challenge, as it degrades system performance. This paper reviews ASR, its history, speech production and perception, terminology, framework, and design challenges, focusing on speech corpus design and preprocessing. Traditional time and frequency domain methods are inadequate for nonstationary noise due to fixed window durations. Wavelet-based soft thresholding techniques, however, can handle nonstationary noise with variable window durations, making them effective for preprocessing speech signals contaminated with additive Gaussian noise. Speech signals are decomposed into high- and low-frequency subbands, with most noise present in high-frequency bands. A wavelet-based Bayes shrink algorithm combined with time-adaptive thresholds and soft thresholding is applied to high-frequency subbands to remove noise. The performance of soft thresholding is compared with hard thresholding. Experiments on the Kannada speech corpus and TIMIT dataset show that the proposed method performs better across various SNR levels. Keywords: Automatic speech recognition, Wavelet packet transform, Deep neural networks, Speech enhancement. ASR is a technology converting acoustic speech into text. It has been a research area for over 60 years, with significant progress in recent decades. ASR systems have evolved from handling small vocabularies to large ones. Human speech recognition is highly stable, even in adverse conditions, while machines lag. ASR systems are crucial for communication, especially in multilingual environments. Despite advancements, ideal speech recognition remains unachieved due to factors like channel, noise, dialects, accents, age, and speaker conditions. Historical research in speech processing began with Kratzenstein's work in 1769 and continued with Dudley's efforts in the 1930s. Speech recognition is now widely applied globally, with many consumer products using voice commands. ASR systems perform well in closed environments but degrade in noisy, open settings. Wavelet transforms are effective for speech denoising, so this paper proposes a wavelet-based preprocessing system to improve ASR performance. A Kannada speech corpus of 10 hours is created, labeled, and partitioned. A custom wavelet packet transform is applied to speech signals. The paper is organized into sections on background, literature survey, dataset, results, and conclusion.