Intrusion detection with unlabeled data using clustering

Intrusion detection with unlabeled data using clustering

| Leonid Portnoy
This paper presents a framework for automatically detecting intrusions in a network, including new and unknown types, using unlabeled data. The system does not require manually classified data for training and can detect various intrusion types while maintaining a low false positive rate. The method relies on clustering similar data instances and using distance metrics to identify anomalies. It assumes that data instances of the same type (normal or attack) are close in feature space, while those of different types are far apart. The system uses the KDD CUP 99 dataset for training and testing, and performs well with a detection rate of around 40%-55% and a false positive rate of 1.3%-2.3%. The system is evaluated using cross-validation and shows good performance across different training and test sets. The paper also discusses related work in clustering and anomaly detection, and analyzes the system's performance, trade-offs between detection and false positive rates, and variations in clustering and detection methods. The results show that the system's performance depends on the training set used, and that the assumption that normal data clusters together is not always satisfied, leading to some misclassifications. The paper concludes that the system is useful for detecting intrusions without requiring labeled data, and that it can be implemented as part of a larger intrusion detection system. Future work involves improving the system's performance and automation.This paper presents a framework for automatically detecting intrusions in a network, including new and unknown types, using unlabeled data. The system does not require manually classified data for training and can detect various intrusion types while maintaining a low false positive rate. The method relies on clustering similar data instances and using distance metrics to identify anomalies. It assumes that data instances of the same type (normal or attack) are close in feature space, while those of different types are far apart. The system uses the KDD CUP 99 dataset for training and testing, and performs well with a detection rate of around 40%-55% and a false positive rate of 1.3%-2.3%. The system is evaluated using cross-validation and shows good performance across different training and test sets. The paper also discusses related work in clustering and anomaly detection, and analyzes the system's performance, trade-offs between detection and false positive rates, and variations in clustering and detection methods. The results show that the system's performance depends on the training set used, and that the assumption that normal data clusters together is not always satisfied, leading to some misclassifications. The paper concludes that the system is useful for detecting intrusions without requiring labeled data, and that it can be implemented as part of a larger intrusion detection system. Future work involves improving the system's performance and automation.
Reach us at info@futurestudyspace.com