June 2010 | David Newman, Jey Han Lau, Karl Grieser, Timothy Baldwin
This paper introduces the novel task of evaluating the coherence of topics generated by topic models. The authors apply various topic scoring models, including those based on WordNet, Wikipedia, and the Google search engine, to assess the quality of topics. They compare these methods with human ratings of topics from two distinct datasets and find that a simple co-occurrence measure based on point-wise mutual information over Wikipedia data achieves results close to human inter-annotator correlation. Other Wikipedia-based methods also perform well, while Google produces strong but less consistent results. The study highlights the importance of intrinsic evaluation of topics and suggests that term co-occurrence within Wikipedia, using pointwise mutual information, is the most effective method for evaluating topic coherence.This paper introduces the novel task of evaluating the coherence of topics generated by topic models. The authors apply various topic scoring models, including those based on WordNet, Wikipedia, and the Google search engine, to assess the quality of topics. They compare these methods with human ratings of topics from two distinct datasets and find that a simple co-occurrence measure based on point-wise mutual information over Wikipedia data achieves results close to human inter-annotator correlation. Other Wikipedia-based methods also perform well, while Google produces strong but less consistent results. The study highlights the importance of intrinsic evaluation of topics and suggests that term co-occurrence within Wikipedia, using pointwise mutual information, is the most effective method for evaluating topic coherence.