[slides] An Empirical Study of Smoothing Techniques for Language Modeling

This paper presents an extensive empirical comparison of smoothing techniques in language modeling, including Jelinek-Mercer, Katz, and Church-Gale smoothing. The authors investigate how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the performance of these methods, measured through cross-entropy of test data. They introduce two novel smoothing techniques: one based on Jelinek-Mercer smoothing and another simple linear interpolation method. These new methods outperform existing techniques, especially for trigram models. The study also highlights the importance of optimal parameter selection and the impact of sub-optimal choices on performance. The results suggest that Katz smoothing performs well on trigram models from large training sets, while Church-Gale smoothing excels on bigram models from large training sets. The novel methods, average-count and one-count, are superior for trigram models and perform well on bigram models. The paper concludes with discussions on the implications of these findings and suggestions for future research.This paper presents an extensive empirical comparison of smoothing techniques in language modeling, including Jelinek-Mercer, Katz, and Church-Gale smoothing. The authors investigate how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the performance of these methods, measured through cross-entropy of test data. They introduce two novel smoothing techniques: one based on Jelinek-Mercer smoothing and another simple linear interpolation method. These new methods outperform existing techniques, especially for trigram models. The study also highlights the importance of optimal parameter selection and the impact of sub-optimal choices on performance. The results suggest that Katz smoothing performs well on trigram models from large training sets, while Church-Gale smoothing excels on bigram models from large training sets. The novel methods, average-count and one-count, are superior for trigram models and perform well on bigram models. The paper concludes with discussions on the implications of these findings and suggestions for future research.

An Empirical Study of Smoothing Techniques for Language Modeling

| Stanley F. Chen, Joshua Goodman