2009 | Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, David Mimno
The paper evaluates various methods for estimating the probability of held-out documents given a trained model in topic modeling, focusing on latent Dirichlet allocation (LDA). It highlights the limitations of commonly used methods, such as the harmonic mean method and empirical likelihood method, which are often inaccurate and computationally inefficient. The authors propose two alternative methods: a Chib-style estimator and a "left-to-right" evaluation algorithm. These methods are shown to be more accurate and efficient through empirical results on synthetic and real-world datasets. The paper also discusses the sensitivity of these methods to perturbations in the parameters of the model and provides a clear, interpretable metric for evaluating topic models.The paper evaluates various methods for estimating the probability of held-out documents given a trained model in topic modeling, focusing on latent Dirichlet allocation (LDA). It highlights the limitations of commonly used methods, such as the harmonic mean method and empirical likelihood method, which are often inaccurate and computationally inefficient. The authors propose two alternative methods: a Chib-style estimator and a "left-to-right" evaluation algorithm. These methods are shown to be more accurate and efficient through empirical results on synthetic and real-world datasets. The paper also discusses the sensitivity of these methods to perturbations in the parameters of the model and provides a clear, interpretable metric for evaluating topic models.