October 28-30, 2024 | Takashi Koide, Naoki Fukushi, Hiroki Nakano, and Daiki Chiba
This paper introduces CHATSPAMDETECTOR, a system that uses large language models (LLMs) to detect phishing emails. The system converts email data into prompts suitable for LLM analysis, enabling highly accurate determination of whether an email is phishing or not. It also provides detailed reasoning for its phishing determinations, helping users make informed decisions about suspicious emails. The system was evaluated using a comprehensive phishing email dataset and compared to several LLMs and baseline systems. The results showed that the system using GPT-4 achieved an accuracy of 99.70%, outperforming other models and baseline systems. The system's ability to extract key indicators from email headers and bodies, prioritize them, and generate accurate responses confirms its effectiveness in phishing detection. The system can identify various phishing tactics, including sender spoofing and brand impersonation, and reveal techniques attackers use to evade malicious email filters. The system provides users with detailed reports, including the rationale behind its decisions, enabling them to make informed decisions about emails. The contributions of this paper include the proposal of CHATSPAMDETECTOR, the collection of recent phishing and legitimate emails to create datasets, and a detailed analysis of LLM responses, which shows their sophisticated ability to extract important information from email headers and bodies. The system represents a new step in the fight against phishing emails, as it can be used to replace or complement existing phishing detection features of email services. The paper also discusses the limitations of the system, including the scope of phishing emails and LLM parameters, and suggests ways to improve response accuracy with Retrieval-Augmented Generation (RAG).This paper introduces CHATSPAMDETECTOR, a system that uses large language models (LLMs) to detect phishing emails. The system converts email data into prompts suitable for LLM analysis, enabling highly accurate determination of whether an email is phishing or not. It also provides detailed reasoning for its phishing determinations, helping users make informed decisions about suspicious emails. The system was evaluated using a comprehensive phishing email dataset and compared to several LLMs and baseline systems. The results showed that the system using GPT-4 achieved an accuracy of 99.70%, outperforming other models and baseline systems. The system's ability to extract key indicators from email headers and bodies, prioritize them, and generate accurate responses confirms its effectiveness in phishing detection. The system can identify various phishing tactics, including sender spoofing and brand impersonation, and reveal techniques attackers use to evade malicious email filters. The system provides users with detailed reports, including the rationale behind its decisions, enabling them to make informed decisions about emails. The contributions of this paper include the proposal of CHATSPAMDETECTOR, the collection of recent phishing and legitimate emails to create datasets, and a detailed analysis of LLM responses, which shows their sophisticated ability to extract important information from email headers and bodies. The system represents a new step in the fight against phishing emails, as it can be used to replace or complement existing phishing detection features of email services. The paper also discusses the limitations of the system, including the scope of phishing emails and LLM parameters, and suggests ways to improve response accuracy with Retrieval-Augmented Generation (RAG).