2014 | Margaret E. Roberts, Brandon M. Stewart, Dustin Tingley, Christopher Lucas, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, David G. Rand
This article introduces the structural topic model (STM), a semiautomated method for analyzing open-ended survey responses. The STM builds on recent advances in machine learning-based text analysis and incorporates information about the document, such as the author's gender, political affiliation, and treatment assignment. It allows researchers to analyze open-ended responses more efficiently, revealing insights and estimating treatment effects. The STM is particularly useful for survey researchers and experimentalists, as it enables the analysis of open-ended data alongside closed-ended data, such as party preferences or treatment conditions.
The STM is compared to traditional methods like latent Dirichlet allocation (LDA), which is an unsupervised method for topic modeling. Unlike LDA, the STM incorporates covariates, allowing for the analysis of how topics vary with respondent characteristics. This makes the STM more versatile for survey and experimental research. The article illustrates the STM's capabilities through analyses of text from surveys and experiments, including a study on immigration preferences and a laboratory experiment on public goods provision.
The STM provides a framework for estimating quantities of interest, such as the prevalence and content of topics in open-ended responses. It allows researchers to examine how treatment conditions affect the discussion of topics and the language used to discuss them. The model also includes tools for preprocessing text, model selection, and visualization, as well as best practices for human intervention in unsupervised learning.
The article discusses the advantages and limitations of incorporating open-ended responses into research designs, emphasizing the importance of considering both the content and context of the responses. It also highlights the benefits of using the STM for analyzing open-ended data, including its ability to provide more accurate estimates of treatment effects and to reveal insights that might be missed by traditional methods.
The STM is compared to other models, including LDA, factor analysis, and single-membership models, and is shown to provide more accurate estimates of quantities of interest when compared to using LDA with covariates in a two-stage process. The article also discusses the importance of validating the model through simulations and examples, demonstrating its ability to recover treatment effects accurately.
The STM is applied to real-world data, including analyses of immigration preferences and public goods provision, showing how it can be used to estimate treatment effects and understand the language used to discuss topics. The model is also used to examine how different groups, such as men and women, use different vocabulary to describe their intuition, highlighting the importance of considering covariates in topic modeling.
Overall, the STM provides a powerful tool for analyzing open-ended survey responses, offering a balance between automation and human interpretation. It allows researchers to uncover insights from text data more efficiently and accurately, making it a valuable addition to the toolkit of survey researchers and experimentalists.This article introduces the structural topic model (STM), a semiautomated method for analyzing open-ended survey responses. The STM builds on recent advances in machine learning-based text analysis and incorporates information about the document, such as the author's gender, political affiliation, and treatment assignment. It allows researchers to analyze open-ended responses more efficiently, revealing insights and estimating treatment effects. The STM is particularly useful for survey researchers and experimentalists, as it enables the analysis of open-ended data alongside closed-ended data, such as party preferences or treatment conditions.
The STM is compared to traditional methods like latent Dirichlet allocation (LDA), which is an unsupervised method for topic modeling. Unlike LDA, the STM incorporates covariates, allowing for the analysis of how topics vary with respondent characteristics. This makes the STM more versatile for survey and experimental research. The article illustrates the STM's capabilities through analyses of text from surveys and experiments, including a study on immigration preferences and a laboratory experiment on public goods provision.
The STM provides a framework for estimating quantities of interest, such as the prevalence and content of topics in open-ended responses. It allows researchers to examine how treatment conditions affect the discussion of topics and the language used to discuss them. The model also includes tools for preprocessing text, model selection, and visualization, as well as best practices for human intervention in unsupervised learning.
The article discusses the advantages and limitations of incorporating open-ended responses into research designs, emphasizing the importance of considering both the content and context of the responses. It also highlights the benefits of using the STM for analyzing open-ended data, including its ability to provide more accurate estimates of treatment effects and to reveal insights that might be missed by traditional methods.
The STM is compared to other models, including LDA, factor analysis, and single-membership models, and is shown to provide more accurate estimates of quantities of interest when compared to using LDA with covariates in a two-stage process. The article also discusses the importance of validating the model through simulations and examples, demonstrating its ability to recover treatment effects accurately.
The STM is applied to real-world data, including analyses of immigration preferences and public goods provision, showing how it can be used to estimate treatment effects and understand the language used to discuss topics. The model is also used to examine how different groups, such as men and women, use different vocabulary to describe their intuition, highlighting the importance of considering covariates in topic modeling.
Overall, the STM provides a powerful tool for analyzing open-ended survey responses, offering a balance between automation and human interpretation. It allows researchers to uncover insights from text data more efficiently and accurately, making it a valuable addition to the toolkit of survey researchers and experimentalists.