22 April 2024 | Jianqiao Lai, Xinran Yang, Wenyue Luo, Linjiang Zhou, Langchen Li, Yongqi Wang and Xiaochuan Shi
This paper introduces a novel approach for fake news detection using a large language model called RumorLLM, which is fine-tuned with rumor writing styles and content. The key contributions include the development of RumorLLM and a data-augmentation method for small categories, effectively mitigating the issue of category imbalance in real-world fake-news datasets. The model is evaluated on the BuzzFeed and PolitiFact datasets, demonstrating superior performance in terms of F1 score and AUC-ROC compared to baseline methods. The model's robust performance highlights its effectiveness in handling imbalanced datasets and provides a promising solution to the pressing issue of false-information proliferation. The paper also proposes a method based on RumorLLM and prompt engineering to diversify and enhance the small categories of samples, improving the model's ability to discriminate complex rumors generated by artificial intelligence. The approach combines RumorLLM with state-of-the-art classification models and validates the effectiveness of the methods using real datasets. The results show that the proposed model outperforms existing methods in accuracy, precision, recall, F1 score, and AUC-ROC. The paper also discusses the limitations of the current approach, including its focus on plain text and the potential ethical implications of using RumorLLM for generating rumors. Future research directions include extending the model to handle multimodal content and improving its interpretability and ethical considerations. The study highlights the potential of RumorLLM in enhancing fake news detection and addressing emerging challenges in combating misinformation.This paper introduces a novel approach for fake news detection using a large language model called RumorLLM, which is fine-tuned with rumor writing styles and content. The key contributions include the development of RumorLLM and a data-augmentation method for small categories, effectively mitigating the issue of category imbalance in real-world fake-news datasets. The model is evaluated on the BuzzFeed and PolitiFact datasets, demonstrating superior performance in terms of F1 score and AUC-ROC compared to baseline methods. The model's robust performance highlights its effectiveness in handling imbalanced datasets and provides a promising solution to the pressing issue of false-information proliferation. The paper also proposes a method based on RumorLLM and prompt engineering to diversify and enhance the small categories of samples, improving the model's ability to discriminate complex rumors generated by artificial intelligence. The approach combines RumorLLM with state-of-the-art classification models and validates the effectiveness of the methods using real datasets. The results show that the proposed model outperforms existing methods in accuracy, precision, recall, F1 score, and AUC-ROC. The paper also discusses the limitations of the current approach, including its focus on plain text and the potential ethical implications of using RumorLLM for generating rumors. Future research directions include extending the model to handle multimodal content and improving its interpretability and ethical considerations. The study highlights the potential of RumorLLM in enhancing fake news detection and addressing emerging challenges in combating misinformation.