Understanding Noise power spectral density estimation based on optimal smoothing and minimum statistics

This paper presents a novel method for estimating the power spectral density (PSD) of nonstationary noise in a noisy speech signal. The method is based on optimal smoothing and minimum statistics. Unlike traditional approaches that use voice activity detectors (VADs), this method tracks spectral minima in each frequency band without distinguishing between speech activity and pauses. It minimizes a conditional mean square error criterion to determine the optimal smoothing parameter for recursive smoothing of the noisy speech signal's PSD. An unbiased noise estimator is then developed based on the optimally smoothed PSD and the statistics of spectral minima. The estimator is suitable for real-time implementation and improves performance in nonstationary noise by speeding up the tracking of spectral minima. The method involves deriving an optimal smoothing parameter that balances the variance of the smoothed PSD and its tracking capability. It also includes an error monitoring algorithm to adjust the smoothing parameter based on the deviation of the short-term PSD estimate from the actual averaged periodogram. The algorithm also compensates for bias in the minimum power estimates by using a time and frequency-dependent bias correction factor. The paper evaluates the method in the context of speech enhancement and low-bit-rate speech coding with various noise types. It shows that the method performs well in nonstationary noise and provides accurate noise PSD estimates. The algorithm is efficient and generic, with results showing that the time-varying smoothing significantly improves the minimum statistics approach. The method is compared to traditional approaches and is found to better preserve weak speech sounds, leading to improved intelligibility. The paper also includes results from listening tests and formal quality and intelligibility tests, confirming the effectiveness of the method.This paper presents a novel method for estimating the power spectral density (PSD) of nonstationary noise in a noisy speech signal. The method is based on optimal smoothing and minimum statistics. Unlike traditional approaches that use voice activity detectors (VADs), this method tracks spectral minima in each frequency band without distinguishing between speech activity and pauses. It minimizes a conditional mean square error criterion to determine the optimal smoothing parameter for recursive smoothing of the noisy speech signal's PSD. An unbiased noise estimator is then developed based on the optimally smoothed PSD and the statistics of spectral minima. The estimator is suitable for real-time implementation and improves performance in nonstationary noise by speeding up the tracking of spectral minima. The method involves deriving an optimal smoothing parameter that balances the variance of the smoothed PSD and its tracking capability. It also includes an error monitoring algorithm to adjust the smoothing parameter based on the deviation of the short-term PSD estimate from the actual averaged periodogram. The algorithm also compensates for bias in the minimum power estimates by using a time and frequency-dependent bias correction factor. The paper evaluates the method in the context of speech enhancement and low-bit-rate speech coding with various noise types. It shows that the method performs well in nonstationary noise and provides accurate noise PSD estimates. The algorithm is efficient and generic, with results showing that the time-varying smoothing significantly improves the minimum statistics approach. The method is compared to traditional approaches and is found to better preserve weak speech sounds, leading to improved intelligibility. The paper also includes results from listening tests and formal quality and intelligibility tests, confirming the effectiveness of the method.

Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics

July 2001 | Rainer Martin, Senior Member, IEEE