2014 | Margaret E. Roberts, Brandon M. Stewart, Dustin Tingley, Christopher Lucas, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, David G. Rand
The article introduces the Structural Topic Model (STM), a semi-automated approach for analyzing open-ended survey responses, which leverages recent advancements in machine learning for textual data analysis. Unlike traditional methods that rely heavily on human coding, the STM incorporates information about the document, such as the author's gender, political affiliation, and treatment assignment, making the analysis more efficient and revealing. The STM is particularly useful for survey researchers and experimentalists, as it can estimate treatment effects and provide insights into how topical prevalence and content vary with different covariates. The article demonstrates the STM's effectiveness through various experiments and an analysis of open-ended data from the American National Election Study (ANES). The STM allows for the discovery of topics from the data, rather than assuming them, and can be used to measure systematic changes in topical prevalence and content over different conditions. The model is estimated using a variational expectation-maximization approach and includes shrinkage priors to handle overfitting. The article also discusses model specification, selection, and validation methods, and compares the STM to other statistical topic models and supervised learning techniques. Overall, the STM offers a flexible and powerful tool for analyzing open-ended survey responses, providing interpretable quantities of interest and facilitating the estimation of treatment effects.The article introduces the Structural Topic Model (STM), a semi-automated approach for analyzing open-ended survey responses, which leverages recent advancements in machine learning for textual data analysis. Unlike traditional methods that rely heavily on human coding, the STM incorporates information about the document, such as the author's gender, political affiliation, and treatment assignment, making the analysis more efficient and revealing. The STM is particularly useful for survey researchers and experimentalists, as it can estimate treatment effects and provide insights into how topical prevalence and content vary with different covariates. The article demonstrates the STM's effectiveness through various experiments and an analysis of open-ended data from the American National Election Study (ANES). The STM allows for the discovery of topics from the data, rather than assuming them, and can be used to measure systematic changes in topical prevalence and content over different conditions. The model is estimated using a variational expectation-maximization approach and includes shrinkage priors to handle overfitting. The article also discusses model specification, selection, and validation methods, and compares the STM to other statistical topic models and supervised learning techniques. Overall, the STM offers a flexible and powerful tool for analyzing open-ended survey responses, providing interpretable quantities of interest and facilitating the estimation of treatment effects.