Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization

Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization

2018 | Iman Sharafaldin, Arash Habibi Lashkari and Ali A. Ghorbani
This paper addresses the critical need for a comprehensive and reliable dataset for intrusion detection systems (IDS). The authors analyze eleven publicly available IDS datasets from 1998 to 2016, highlighting their limitations, such as lack of traffic diversity, outdated attack types, and incomplete feature sets. They propose a new dataset, CICIDS2017, which includes seven common attack scenarios and covers all eleven criteria for a valid IDS dataset. The dataset is generated using a real-world network setup with a Victim-Network and an Attack-Network, ensuring realistic traffic and diverse attacks. The paper evaluates the dataset by extracting 80 network traffic features and selecting the best feature sets for detecting various attacks using machine learning algorithms. The performance of these algorithms is assessed using precision, recall, and F1 scores. The generated dataset is compared with existing datasets, demonstrating its superiority in terms of completeness and realism. The authors conclude by emphasizing the importance of a reliable IDS dataset and suggesting future improvements, such as increasing the number of PCs and incorporating more recent attacks.This paper addresses the critical need for a comprehensive and reliable dataset for intrusion detection systems (IDS). The authors analyze eleven publicly available IDS datasets from 1998 to 2016, highlighting their limitations, such as lack of traffic diversity, outdated attack types, and incomplete feature sets. They propose a new dataset, CICIDS2017, which includes seven common attack scenarios and covers all eleven criteria for a valid IDS dataset. The dataset is generated using a real-world network setup with a Victim-Network and an Attack-Network, ensuring realistic traffic and diverse attacks. The paper evaluates the dataset by extracting 80 network traffic features and selecting the best feature sets for detecting various attacks using machine learning algorithms. The performance of these algorithms is assessed using precision, recall, and F1 scores. The generated dataset is compared with existing datasets, demonstrating its superiority in terms of completeness and realism. The authors conclude by emphasizing the importance of a reliable IDS dataset and suggesting future improvements, such as increasing the number of PCs and incorporating more recent attacks.
Reach us at info@study.space