July 25, 2010 | Liangjie Hong and Brian D. Davison
This paper explores the application of topic models in Twitter, a micro-blogging platform where messages are limited to 140 characters. The authors address the challenge of using standard topic models in such short text environments by proposing and comparing several training schemes. They find that training a topic model on aggregated messages from the same user results in better performance and higher quality topics compared to training on individual messages. The study also highlights the limitations of the Author-Topic (AT) model in capturing hierarchical relationships between entities in social media. Through a series of experiments, the authors demonstrate that topic models can significantly improve classification performance, particularly in predicting popular messages and classifying messages and users into topical categories. The paper concludes with discussions on the effectiveness of different aggregation strategies and the potential for future research in modeling hierarchical structures in social media data.This paper explores the application of topic models in Twitter, a micro-blogging platform where messages are limited to 140 characters. The authors address the challenge of using standard topic models in such short text environments by proposing and comparing several training schemes. They find that training a topic model on aggregated messages from the same user results in better performance and higher quality topics compared to training on individual messages. The study also highlights the limitations of the Author-Topic (AT) model in capturing hierarchical relationships between entities in social media. Through a series of experiments, the authors demonstrate that topic models can significantly improve classification performance, particularly in predicting popular messages and classifying messages and users into topical categories. The paper concludes with discussions on the effectiveness of different aggregation strategies and the potential for future research in modeling hierarchical structures in social media data.