2024 | Il Hwan Ji, Ju Hyeon Lee, Min Ji Kang, Woo Jin Park, Seung Ho Jeon, Jung Taek Seo
A systematic literature review was conducted to analyze AI-based anomaly detection techniques over encrypted traffic. The review identified 30 high-quality research articles published between 2019 and 2023. The study focused on datasets, feature extraction, feature selection, preprocessing, anomaly detection algorithms, and performance indicators. The results showed that various techniques are used for AI-based anomaly detection over encrypted traffic, some of which are similar to those used for unencrypted traffic, while others are different. The review found that encrypted traffic datasets are often used in anomaly detection research, including CTU-Malware-Captures, ISCXVPN-NONVPN, CTU-13, USTC-TFC 2016, NSL-KDD, UNSW-NB15, CIC-IDS-2017, Datacon 2020, MTA, STRA, UNSW NS 2019, CIC-AndMal2017, MCFP, CICIDS-2012, CIC-InvesAndMal2019, CIRA-CIC-DoHBRW-2020, and CES-CIC-IDS 2018. These datasets contain varying percentages of encrypted traffic data. The study also examined feature extraction methods, including statistics-based and log information-based approaches, and feature selection methods such as filtering, manual selection, and exhaustive search. Preprocessing techniques such as normalization, data cleaning, length unification, and data conversion were also analyzed. The review identified various AI algorithms used for anomaly detection, including linear regression, logistic regression, Naïve Bayes, C4.5, CART, K-nearest neighbor, ensemble methods, random forest, XGBoost, and support vector machine (SVM). The results indicate that AI-based anomaly detection over encrypted traffic is an active research area, with various techniques and datasets being used to improve the accuracy and efficiency of anomaly detection models. The study highlights the importance of using appropriate datasets, feature extraction, and preprocessing techniques to enhance the performance of AI-based anomaly detection models over encrypted traffic.A systematic literature review was conducted to analyze AI-based anomaly detection techniques over encrypted traffic. The review identified 30 high-quality research articles published between 2019 and 2023. The study focused on datasets, feature extraction, feature selection, preprocessing, anomaly detection algorithms, and performance indicators. The results showed that various techniques are used for AI-based anomaly detection over encrypted traffic, some of which are similar to those used for unencrypted traffic, while others are different. The review found that encrypted traffic datasets are often used in anomaly detection research, including CTU-Malware-Captures, ISCXVPN-NONVPN, CTU-13, USTC-TFC 2016, NSL-KDD, UNSW-NB15, CIC-IDS-2017, Datacon 2020, MTA, STRA, UNSW NS 2019, CIC-AndMal2017, MCFP, CICIDS-2012, CIC-InvesAndMal2019, CIRA-CIC-DoHBRW-2020, and CES-CIC-IDS 2018. These datasets contain varying percentages of encrypted traffic data. The study also examined feature extraction methods, including statistics-based and log information-based approaches, and feature selection methods such as filtering, manual selection, and exhaustive search. Preprocessing techniques such as normalization, data cleaning, length unification, and data conversion were also analyzed. The review identified various AI algorithms used for anomaly detection, including linear regression, logistic regression, Naïve Bayes, C4.5, CART, K-nearest neighbor, ensemble methods, random forest, XGBoost, and support vector machine (SVM). The results indicate that AI-based anomaly detection over encrypted traffic is an active research area, with various techniques and datasets being used to improve the accuracy and efficiency of anomaly detection models. The study highlights the importance of using appropriate datasets, feature extraction, and preprocessing techniques to enhance the performance of AI-based anomaly detection models over encrypted traffic.