Understanding the Demographics of Twitter Users

Understanding the Demographics of Twitter Users

2011 | Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Jukka-Pekka Onnela, J. Niels Rosenquist
This paper investigates the demographics of Twitter users, analyzing data from over 1.7 billion tweets sent by 54.9 million users between 2006 and 2009. The study focuses on U.S. users, as they represent over 1% of the U.S. population. The research compares Twitter users to the U.S. population along three axes: geography, gender, and race/ethnicity. Geographically, Twitter users are more likely to live in populous counties than expected, with sparse regions significantly underrepresented. The distribution of Twitter users is highly non-uniform, with populous counties overrepresented and less populous counties underrepresented. This suggests that entire regions of the U.S. may be significantly underrepresented in the Twitter population. In terms of gender, the study finds a significant male bias, with 71.8% of users with a name match having a male name. This bias is decreasing over time. For race/ethnicity, the study uses last names to infer race/ethnicity, but due to the ambiguity of last names, direct comparisons to the U.S. Census are not possible. However, relative comparisons between regions show that Hispanic users are undersampled in the southwest, African-American users in the south and midwest, and Caucasian users in many major cities. The study highlights that Twitter users are a highly non-uniform sample of the U.S. population, with significant overrepresentation in populous areas and underrepresentation in sparsely populated regions. The findings suggest that Twitter may not be a representative sample of the overall population, which has implications for using Twitter data for predictions and measurements. The study sets the foundation for future research on Twitter data and its potential as a tool for inferring population characteristics.This paper investigates the demographics of Twitter users, analyzing data from over 1.7 billion tweets sent by 54.9 million users between 2006 and 2009. The study focuses on U.S. users, as they represent over 1% of the U.S. population. The research compares Twitter users to the U.S. population along three axes: geography, gender, and race/ethnicity. Geographically, Twitter users are more likely to live in populous counties than expected, with sparse regions significantly underrepresented. The distribution of Twitter users is highly non-uniform, with populous counties overrepresented and less populous counties underrepresented. This suggests that entire regions of the U.S. may be significantly underrepresented in the Twitter population. In terms of gender, the study finds a significant male bias, with 71.8% of users with a name match having a male name. This bias is decreasing over time. For race/ethnicity, the study uses last names to infer race/ethnicity, but due to the ambiguity of last names, direct comparisons to the U.S. Census are not possible. However, relative comparisons between regions show that Hispanic users are undersampled in the southwest, African-American users in the south and midwest, and Caucasian users in many major cities. The study highlights that Twitter users are a highly non-uniform sample of the U.S. population, with significant overrepresentation in populous areas and underrepresentation in sparsely populated regions. The findings suggest that Twitter may not be a representative sample of the overall population, which has implications for using Twitter data for predictions and measurements. The study sets the foundation for future research on Twitter data and its potential as a tool for inferring population characteristics.
Reach us at info@study.space
Understanding Understanding the Demographics of Twitter Users