Cepstral Analysis Technique for Automatic Speaker Verification

Cepstral Analysis Technique for Automatic Speaker Verification

1981, 4 | SADAOKI FURUI
This paper presents a new technique for automatic speaker verification using telephone speech. The system operates by analyzing a fixed, sentence-long utterance to extract time functions from acoustic data. Cepstrum coefficients are derived through LPC analysis, and frequency response distortions are removed. These time functions are expanded using orthogonal polynomial representations, and after feature selection, they are aligned with stored reference functions to calculate overall distance using a dynamic programming-based time warping method. The system accepts or rejects an identity claim based on this distance, with reference functions and thresholds updated for each customer. The system was evaluated using several utterance sets, including male and female speech recorded over conventional telephone lines. Male utterances processed by ADPCM and LPC coding were used alongside unprocessed utterances. Experimental results showed that verification error rates of one percent or less could be achieved even under different transmission conditions. The system's operation involves retrieving reference data for an identity claim, analyzing the sample utterance, extracting cepstrum coefficients, and aligning them with reference functions. The time functions of the cepstrum coefficients are expanded using orthogonal polynomials, and the most effective coefficients for speaker verification are selected based on inter- and intraspeaker variability ratios. A new time warping method using dynamic programming is used to align the sample utterance with the reference template. The overall distance between the sample and reference is calculated, weighted by intraspeaker variability, and compared to a threshold to determine acceptance or rejection. The paper also discusses the effectiveness of cepstrum normalization, polynomial coefficients, and the impact of different transmission systems on speaker verification. Results show that cepstrum normalization significantly improves performance, and that the system is robust to variations in transmission conditions. The use of polynomial coefficients and dynamic time warping enhances accuracy, while the system's ability to adapt to different utterance lengths and transmission systems ensures its effectiveness in real-world scenarios. The paper concludes that the proposed techniques are highly effective for telephone speech verification.This paper presents a new technique for automatic speaker verification using telephone speech. The system operates by analyzing a fixed, sentence-long utterance to extract time functions from acoustic data. Cepstrum coefficients are derived through LPC analysis, and frequency response distortions are removed. These time functions are expanded using orthogonal polynomial representations, and after feature selection, they are aligned with stored reference functions to calculate overall distance using a dynamic programming-based time warping method. The system accepts or rejects an identity claim based on this distance, with reference functions and thresholds updated for each customer. The system was evaluated using several utterance sets, including male and female speech recorded over conventional telephone lines. Male utterances processed by ADPCM and LPC coding were used alongside unprocessed utterances. Experimental results showed that verification error rates of one percent or less could be achieved even under different transmission conditions. The system's operation involves retrieving reference data for an identity claim, analyzing the sample utterance, extracting cepstrum coefficients, and aligning them with reference functions. The time functions of the cepstrum coefficients are expanded using orthogonal polynomials, and the most effective coefficients for speaker verification are selected based on inter- and intraspeaker variability ratios. A new time warping method using dynamic programming is used to align the sample utterance with the reference template. The overall distance between the sample and reference is calculated, weighted by intraspeaker variability, and compared to a threshold to determine acceptance or rejection. The paper also discusses the effectiveness of cepstrum normalization, polynomial coefficients, and the impact of different transmission systems on speaker verification. Results show that cepstrum normalization significantly improves performance, and that the system is robust to variations in transmission conditions. The use of polynomial coefficients and dynamic time warping enhances accuracy, while the system's ability to adapt to different utterance lengths and transmission systems ensures its effectiveness in real-world scenarios. The paper concludes that the proposed techniques are highly effective for telephone speech verification.
Reach us at info@study.space