1998 | Richard P. Lippmann, David J. Fried, Isaac Graf, Joshua W. Haines, Kristopher R. Kendall, David McClung, Dan Weber, Seth E. Webster, Dan Wyschogrod, Robert K. Cunningham, and Marc A. Zissman
The 1998 DARPA Off-line Intrusion Detection Evaluation assessed the performance of intrusion detection systems (IDS) in detecting various types of attacks. The evaluation involved generating realistic network traffic and launching over 300 instances of 38 different automated attacks against UNIX hosts. Six research groups participated in a blind evaluation, and results were analyzed for four attack categories: probe, denial-of-service (DoS), remote-to-local (R2L), and user-to-root (U2R). The best systems detected old attacks with detection rates ranging from 63% to 93% at a false alarm rate of 10 per day. However, detection rates were much worse for new and novel R2L and DoS attacks included only in the test data. The best systems failed to detect roughly half of these new attacks, including damaging access to root-level privileges by remote users. These results suggest that further research should focus on developing techniques to find new attacks rather than extending existing rule-based approaches.
The evaluation test bed simulated a small Air Force base with 1000s of virtual hosts and 100s of user automata to generate live traffic. It included normal background traffic and labeled attacks, allowing participants to train their systems on training data and then evaluate them on test data. The evaluation used receiver operating characteristic (ROC) techniques to analyze the tradeoff between false alarm and detection rates for detection systems. ROC curves indicated how detection rates change as internal thresholds are varied to generate more or fewer false alarms. Measuring detection rates alone only indicated the types of attacks that an IDS may detect, but did not account for the human workload required to analyze false alarms.
The evaluation involved generating a wide variety of attacks that spanned the types of attacks used by both novice and highly skilled attackers. The test bed used real hosts, live attacks, and live background traffic to recreate normal and attack traffic on a private network. The evaluation included both background traffic generation and attack development, which involved analyzing attack mechanisms and developing stealthy versions of attacks. The evaluation also included a real-time evaluation to address practical concerns such as memory requirements, system processor requirements, and ease of use.
The results showed that the best systems detected old attacks with high accuracy but struggled with new attacks. The best systems missed many new attacks, including those that exploited system weaknesses or used stealthy techniques. The evaluation highlighted the need for research into techniques to detect new attacks rather than relying on existing rule-based approaches. The results also suggested that systems using BSM audit data or file system dumps could provide good performance for certain types of attacks. Overall, the evaluation demonstrated the importance of developing techniques to detect new attacks and highlighted the limitations of existing rule-based approaches.The 1998 DARPA Off-line Intrusion Detection Evaluation assessed the performance of intrusion detection systems (IDS) in detecting various types of attacks. The evaluation involved generating realistic network traffic and launching over 300 instances of 38 different automated attacks against UNIX hosts. Six research groups participated in a blind evaluation, and results were analyzed for four attack categories: probe, denial-of-service (DoS), remote-to-local (R2L), and user-to-root (U2R). The best systems detected old attacks with detection rates ranging from 63% to 93% at a false alarm rate of 10 per day. However, detection rates were much worse for new and novel R2L and DoS attacks included only in the test data. The best systems failed to detect roughly half of these new attacks, including damaging access to root-level privileges by remote users. These results suggest that further research should focus on developing techniques to find new attacks rather than extending existing rule-based approaches.
The evaluation test bed simulated a small Air Force base with 1000s of virtual hosts and 100s of user automata to generate live traffic. It included normal background traffic and labeled attacks, allowing participants to train their systems on training data and then evaluate them on test data. The evaluation used receiver operating characteristic (ROC) techniques to analyze the tradeoff between false alarm and detection rates for detection systems. ROC curves indicated how detection rates change as internal thresholds are varied to generate more or fewer false alarms. Measuring detection rates alone only indicated the types of attacks that an IDS may detect, but did not account for the human workload required to analyze false alarms.
The evaluation involved generating a wide variety of attacks that spanned the types of attacks used by both novice and highly skilled attackers. The test bed used real hosts, live attacks, and live background traffic to recreate normal and attack traffic on a private network. The evaluation included both background traffic generation and attack development, which involved analyzing attack mechanisms and developing stealthy versions of attacks. The evaluation also included a real-time evaluation to address practical concerns such as memory requirements, system processor requirements, and ease of use.
The results showed that the best systems detected old attacks with high accuracy but struggled with new attacks. The best systems missed many new attacks, including those that exploited system weaknesses or used stealthy techniques. The evaluation highlighted the need for research into techniques to detect new attacks rather than relying on existing rule-based approaches. The results also suggested that systems using BSM audit data or file system dumps could provide good performance for certain types of attacks. Overall, the evaluation demonstrated the importance of developing techniques to detect new attacks and highlighted the limitations of existing rule-based approaches.