The Largest Social Media Ground-Truth Dataset for Real/Fake Content: TruthSeeker

The Largest Social Media Ground-Truth Dataset for Real/Fake Content: TruthSeeker

June 2024 | Sajjad Dadkhah, Member, IEEE, Xichen Zhang, Alexander Gerald Weismann, Amir Firouzi, and Ali A. Ghorbani, Senior Member, IEEE
The paper introduces TruthSeeker, a comprehensive dataset for detecting fake content on social media, particularly Twitter. The dataset, created by crawling and crowd-sourcing from 2009 to 2022, contains over 180,000 labels, verified through a three-factor active learning method involving Amazon Mechanical Turk. The authors implemented various machine learning and deep learning algorithms, including BERT-based models, to evaluate the accuracy of real/fake tweet detection. They also introduced three auxiliary social media scores—bot score, credibility score, and influence score—to understand user characteristics and their impact on content. Clustering analysis using DBSCAN and YAKE keyword extraction further explored topic relationships and tweet labels. The results demonstrate significant improvements in detecting fake content, even with short tweets, and provide valuable insights for enhancing the precision of fake news detection models. The dataset is available for public use, contributing to the ongoing challenge of combating misinformation on social media platforms.The paper introduces TruthSeeker, a comprehensive dataset for detecting fake content on social media, particularly Twitter. The dataset, created by crawling and crowd-sourcing from 2009 to 2022, contains over 180,000 labels, verified through a three-factor active learning method involving Amazon Mechanical Turk. The authors implemented various machine learning and deep learning algorithms, including BERT-based models, to evaluate the accuracy of real/fake tweet detection. They also introduced three auxiliary social media scores—bot score, credibility score, and influence score—to understand user characteristics and their impact on content. Clustering analysis using DBSCAN and YAKE keyword extraction further explored topic relationships and tweet labels. The results demonstrate significant improvements in detecting fake content, even with short tweets, and provide valuable insights for enhancing the precision of fake news detection models. The dataset is available for public use, contributing to the ongoing challenge of combating misinformation on social media platforms.
Reach us at info@study.space
[slides] The Largest Social Media Ground-Truth Dataset for Real%2FFake Content%3A TruthSeeker | StudySpace