27 Mar 2017 | Onur Varol, Emilio Ferrara, Clayton A. Davis, Filippo Menczer, Alessandro Flammini
This paper presents a framework for detecting social bots on Twitter. The framework uses over 1,150 features extracted from user metadata, including friends, tweet content and sentiment, network patterns, and activity time series. These features are used to train highly accurate models to identify bots. The system is evaluated using a publicly available dataset of Twitter bots and manually annotated data. The results suggest that between 9% and 15% of active Twitter accounts are bots. The system also characterizes the interactions between bot and human accounts, revealing that simple bots tend to interact with more human-like bots. Analysis of content flows shows that bots use retweet and mention strategies to interact with different target groups. Clustering analysis reveals several subclasses of accounts, including spammers, self-promoters, and accounts that post content from connected applications. The system is also evaluated using different datasets and classification models, showing high accuracy and agreement between models. The paper also discusses the importance of tracking increasingly sophisticated bots, as deception and detection technologies are in a never-ending arms race. The system is publicly available for use and has received millions of requests. The results highlight the importance of understanding the behavior of social bots and their impact on online discussions. The paper also discusses the challenges of detecting bots, including false positives and false negatives, and the need for continuous updates to the models based on new data. The study provides insights into the nature of social bots and their role in online communication.This paper presents a framework for detecting social bots on Twitter. The framework uses over 1,150 features extracted from user metadata, including friends, tweet content and sentiment, network patterns, and activity time series. These features are used to train highly accurate models to identify bots. The system is evaluated using a publicly available dataset of Twitter bots and manually annotated data. The results suggest that between 9% and 15% of active Twitter accounts are bots. The system also characterizes the interactions between bot and human accounts, revealing that simple bots tend to interact with more human-like bots. Analysis of content flows shows that bots use retweet and mention strategies to interact with different target groups. Clustering analysis reveals several subclasses of accounts, including spammers, self-promoters, and accounts that post content from connected applications. The system is also evaluated using different datasets and classification models, showing high accuracy and agreement between models. The paper also discusses the importance of tracking increasingly sophisticated bots, as deception and detection technologies are in a never-ending arms race. The system is publicly available for use and has received millions of requests. The results highlight the importance of understanding the behavior of social bots and their impact on online discussions. The paper also discusses the challenges of detecting bots, including false positives and false negatives, and the need for continuous updates to the models based on new data. The study provides insights into the nature of social bots and their role in online communication.