Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach

Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach

September 25, 2013 | H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern, Lukasz Dziurzynski, Stephanie M. Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, Martin E. P. Seligman, Lyle H. Ungar
This study analyzes 700 million words, phrases, and topics from 75,000 Facebook users, who also took standard personality tests, to explore variations in language related to personality, gender, and age. Using an open-vocabulary approach, the data itself drives a comprehensive exploration of language that distinguishes people, revealing connections not captured by traditional closed-vocabulary methods. The study finds that personality, gender, and age are strongly linked to language use, with results that are face valid, align with other research, suggest new hypotheses, and provide detailed insights. For example, males use the possessive 'my' more frequently when mentioning their 'wife' or 'girlfriend' than females use 'my' with 'husband' or 'boyfriend'. The study represents the largest analysis of language and personality to date, using a data-driven approach to extract words, phrases, and topics, and correlating them with gender, age, and personality. The study also presents a word cloud-based technique to visualize results of differential language analysis (DLA), and provides a large set of correlations for future research. The study demonstrates that open-vocabulary analyses can yield additional insights and more information than traditional a priori word-category approaches. The results show that personality, gender, and age are strongly linked to language use, with findings that align with previous research and suggest new hypotheses. The study also finds that open-vocabulary features contain more information than a priori word-categories, as demonstrated by their use in predictive models. The study provides a comprehensive set of correlations for future research and highlights the importance of open-vocabulary approaches for gaining insights into psychosocial variables. The study also finds that open-vocabulary features can reveal more detailed insights into language use, such as the correlation between language and personality, gender, or age. The study concludes that open-vocabulary approaches are more effective for gaining insights into psychosocial variables than traditional closed-vocabulary approaches.This study analyzes 700 million words, phrases, and topics from 75,000 Facebook users, who also took standard personality tests, to explore variations in language related to personality, gender, and age. Using an open-vocabulary approach, the data itself drives a comprehensive exploration of language that distinguishes people, revealing connections not captured by traditional closed-vocabulary methods. The study finds that personality, gender, and age are strongly linked to language use, with results that are face valid, align with other research, suggest new hypotheses, and provide detailed insights. For example, males use the possessive 'my' more frequently when mentioning their 'wife' or 'girlfriend' than females use 'my' with 'husband' or 'boyfriend'. The study represents the largest analysis of language and personality to date, using a data-driven approach to extract words, phrases, and topics, and correlating them with gender, age, and personality. The study also presents a word cloud-based technique to visualize results of differential language analysis (DLA), and provides a large set of correlations for future research. The study demonstrates that open-vocabulary analyses can yield additional insights and more information than traditional a priori word-category approaches. The results show that personality, gender, and age are strongly linked to language use, with findings that align with previous research and suggest new hypotheses. The study also finds that open-vocabulary features contain more information than a priori word-categories, as demonstrated by their use in predictive models. The study provides a comprehensive set of correlations for future research and highlights the importance of open-vocabulary approaches for gaining insights into psychosocial variables. The study also finds that open-vocabulary features can reveal more detailed insights into language use, such as the correlation between language and personality, gender, or age. The study concludes that open-vocabulary approaches are more effective for gaining insights into psychosocial variables than traditional closed-vocabulary approaches.
Reach us at info@study.space