June 27, 2014 | Carson Sievert, Kenneth E. Shirley
LDAvis is an interactive web-based visualization tool designed to help users understand and interpret topics extracted from Latent Dirichlet Allocation (LDA) models. The tool combines R and D3 to provide a global view of topics and detailed insights into the terms associated with each topic. Key features include:
1. **Global View of Topics**: Topics are visualized as circles in a two-dimensional space, with circle areas proportional to their prevalence in the corpus. Inter-topic distances are computed using multidimensional scaling.
2. **Term Barcharts**: A horizontal barchart displays the most relevant terms for the selected topic, showing both corpus-wide and topic-specific frequencies.
3. **Linked Selections**: Users can select a topic to see the most relevant terms, and selecting a term reveals its conditional distribution over topics.
4. **Relevance Measure**: A novel measure called "relevance" is proposed to rank terms within topics, balancing the probability of a term under a topic and its exclusivity to that topic. A user study suggests that ranking terms by probability alone is suboptimal for interpretation.
5. **Interactive Exploration**: Users can adjust the weight parameter \(\lambda\) to balance the probability and exclusivity of terms, aiding in topic interpretation.
The authors conducted a user study to determine the optimal value of \(\lambda\) and found that ranking terms by relevance (where \(\lambda < 1\)) improves topic interpretability. The LDAvis system is available as an R package on GitHub, and future work includes expanding the number of topics visualized and comparing different ranking methods.LDAvis is an interactive web-based visualization tool designed to help users understand and interpret topics extracted from Latent Dirichlet Allocation (LDA) models. The tool combines R and D3 to provide a global view of topics and detailed insights into the terms associated with each topic. Key features include:
1. **Global View of Topics**: Topics are visualized as circles in a two-dimensional space, with circle areas proportional to their prevalence in the corpus. Inter-topic distances are computed using multidimensional scaling.
2. **Term Barcharts**: A horizontal barchart displays the most relevant terms for the selected topic, showing both corpus-wide and topic-specific frequencies.
3. **Linked Selections**: Users can select a topic to see the most relevant terms, and selecting a term reveals its conditional distribution over topics.
4. **Relevance Measure**: A novel measure called "relevance" is proposed to rank terms within topics, balancing the probability of a term under a topic and its exclusivity to that topic. A user study suggests that ranking terms by probability alone is suboptimal for interpretation.
5. **Interactive Exploration**: Users can adjust the weight parameter \(\lambda\) to balance the probability and exclusivity of terms, aiding in topic interpretation.
The authors conducted a user study to determine the optimal value of \(\lambda\) and found that ranking terms by relevance (where \(\lambda < 1\)) improves topic interpretability. The LDAvis system is available as an R package on GitHub, and future work includes expanding the number of topics visualized and comparing different ranking methods.