[slides] Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization

This paper presents a new intrusion detection dataset, CICIDS2017, which includes both benign and seven common attack network flows. The dataset is publicly available and meets real-world criteria. It contains over 80 network traffic features extracted using CICFlowMeter software. The paper evaluates the performance of a comprehensive set of network traffic features and machine learning algorithms to determine the best features for detecting specific attack categories. The authors analyze eleven publicly available IDS datasets since 1998 and find that many are outdated, unreliable, and lack traffic diversity, attack variety, and metadata. The CICIDS2017 dataset addresses these issues by including a complete network configuration, complete traffic, labeled data, complete interaction, complete capture, available protocols, attack diversity, heterogeneity, feature set, and metadata. The dataset is generated using a testbed with two networks, Attack-Network and Victim-Network, and includes various attack scenarios such as brute force, DoS, Heartbleed, Web Attack, Infiltration, Botnet, and DDoS. The paper evaluates the dataset using seven common machine learning algorithms and finds that Random Forest is the best algorithm for detecting attacks. The dataset is compared with other public datasets and found to be more comprehensive and reliable. The authors conclude that having a reliable, publicly available IDS evaluation dataset is essential for researchers and producers in this domain.This paper presents a new intrusion detection dataset, CICIDS2017, which includes both benign and seven common attack network flows. The dataset is publicly available and meets real-world criteria. It contains over 80 network traffic features extracted using CICFlowMeter software. The paper evaluates the performance of a comprehensive set of network traffic features and machine learning algorithms to determine the best features for detecting specific attack categories. The authors analyze eleven publicly available IDS datasets since 1998 and find that many are outdated, unreliable, and lack traffic diversity, attack variety, and metadata. The CICIDS2017 dataset addresses these issues by including a complete network configuration, complete traffic, labeled data, complete interaction, complete capture, available protocols, attack diversity, heterogeneity, feature set, and metadata. The dataset is generated using a testbed with two networks, Attack-Network and Victim-Network, and includes various attack scenarios such as brute force, DoS, Heartbleed, Web Attack, Infiltration, Botnet, and DDoS. The paper evaluates the dataset using seven common machine learning algorithms and finds that Random Forest is the best algorithm for detecting attacks. The dataset is compared with other public datasets and found to be more comprehensive and reliable. The authors conclude that having a reliable, publicly available IDS evaluation dataset is essential for researchers and producers in this domain.

Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization

2018 | Iman Sharafaldin, Arash Habibi Lashkari and Ali A. Ghorbani