2004 | Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, Padhraic Smyth
The author-topic model extends Latent Dirichlet Allocation (LDA) to include authorship information, modeling documents as a mixture of topics associated with authors. Each author is associated with a multinomial distribution over topics, and each topic with a multinomial distribution over words. The model is applied to 1,700 NIPS conference papers and 160,000 CiteSeer abstracts, using Gibbs sampling for inference. It compares with LDA and a simple author model, showing improved performance in predicting document content and author similarity. The model captures author interests by associating topics with authors, allowing for the identification of topics used by specific authors and the similarity between authors. The model's results include topic distributions, author distributions, and applications such as author similarity and entropy calculation. The model is evaluated using perplexity, showing better predictive performance than LDA and a simple author model. The author-topic model provides a framework for answering queries about authors and documents, and has potential applications in automated reviewer recommendations and author identification. The model is supported by experiments on NIPS and CiteSeer data, demonstrating its effectiveness in capturing author interests and document content. The model's results show that author-topic models can improve predictive performance and provide insights into author behavior and document content. The model is a probabilistic framework that allows for the exploration of relationships between authors, documents, topics, and words, with potential future directions including the use of citation information and combining topic models with stylometry models.The author-topic model extends Latent Dirichlet Allocation (LDA) to include authorship information, modeling documents as a mixture of topics associated with authors. Each author is associated with a multinomial distribution over topics, and each topic with a multinomial distribution over words. The model is applied to 1,700 NIPS conference papers and 160,000 CiteSeer abstracts, using Gibbs sampling for inference. It compares with LDA and a simple author model, showing improved performance in predicting document content and author similarity. The model captures author interests by associating topics with authors, allowing for the identification of topics used by specific authors and the similarity between authors. The model's results include topic distributions, author distributions, and applications such as author similarity and entropy calculation. The model is evaluated using perplexity, showing better predictive performance than LDA and a simple author model. The author-topic model provides a framework for answering queries about authors and documents, and has potential applications in automated reviewer recommendations and author identification. The model is supported by experiments on NIPS and CiteSeer data, demonstrating its effectiveness in capturing author interests and document content. The model's results show that author-topic models can improve predictive performance and provide insights into author behavior and document content. The model is a probabilistic framework that allows for the exploration of relationships between authors, documents, topics, and words, with potential future directions including the use of citation information and combining topic models with stylometry models.