Intrusion detection with unlabeled data using clustering

Intrusion detection with unlabeled data using clustering

| Leonid Portnoy
This paper presents a method for detecting intrusions using clustering techniques on unlabeled data. The system aims to automatically identify new and unknown types of intrusions without requiring manual classification or prior knowledge of attack types. The approach leverages anomaly detection, which identifies deviations from normal network behavior, to detect intrusions. The method involves clustering similar data instances and using distance metrics to determine anomalies. The system was evaluated using the KDD Cup 1999 dataset, achieving a detection rate of 40%-55% with a false positive rate of 1.3%-2.3%. The paper discusses the methodology, including data normalization, metric selection, clustering, and labeling clusters. It also details the evaluation process, including parameter tuning and cross-validation. The results show that the system performs well, especially when the training set represents a wide variety of intrusion and normal subtypes. The paper concludes by highlighting the advantages of the method over traditional signature-based classifiers and labeled data learning algorithms, emphasizing its automation and adaptability to new intrusions. Future work includes improving the system's performance and automation through various extensions.This paper presents a method for detecting intrusions using clustering techniques on unlabeled data. The system aims to automatically identify new and unknown types of intrusions without requiring manual classification or prior knowledge of attack types. The approach leverages anomaly detection, which identifies deviations from normal network behavior, to detect intrusions. The method involves clustering similar data instances and using distance metrics to determine anomalies. The system was evaluated using the KDD Cup 1999 dataset, achieving a detection rate of 40%-55% with a false positive rate of 1.3%-2.3%. The paper discusses the methodology, including data normalization, metric selection, clustering, and labeling clusters. It also details the evaluation process, including parameter tuning and cross-validation. The results show that the system performs well, especially when the training set represents a wide variety of intrusion and normal subtypes. The paper concludes by highlighting the advantages of the method over traditional signature-based classifiers and labeled data learning algorithms, emphasizing its automation and adaptability to new intrusions. Future work includes improving the system's performance and automation through various extensions.
Reach us at info@study.space