DEEP LEARNING-BASED MALICIOUS TRAFFIC ANALYSIS: A COMPREHENSIVE SURVEY

WenCai He; ZhiJie Peng; MingShen Zhang; He Zhu

doi:10.61784/jcsee3136

Authors

WenCai He College of Cyber Security, Tarim University, Aral 843300, Xinjiang, China.
ZhiJie Peng School of Physics and Electronics, Changsha University of Science and Technology, Changsha 410114, Hunan, China.
MingShen Zhang College of Cyber Security, Tarim University, Aral 843300, Xinjiang, China.
He Zhu (Corresponding Author) College of Cyber Security, Tarim University, Aral 843300, Xinjiang, China.

Keywords:

Malicious traffic analysis, Deep learning, Encrypted traffic detection, Traffic representation learning, Network security

Abstract

Malicious traffic analysis has become increasingly challenging due to encryption-by-default communication, evolving attack behaviors, and distribution shifts across heterogeneous environments. At present, many studies still do not share consistent pipeline designs, traffic representations, benchmark datasets, or assessment methods. Based on this problem, this paper systematically examines deep-learning-based malicious-traffic-analysis technologies. Existing works are organized into four main parts: data acquisition and preprocessing, traffic representation, learning models, and performance evaluation methods. In particular, we compare representative methods, including traditional machine learning, deep learning, and hybrid forms, in terms of detection accuracy, computational cost, cross-scenario generalization ability, and suitable metrics. We also identify key problems in practical applications, such as encrypted traffic monitoring capability, biased datasets, and adversarial attacks, and summarize related research paths. This paper introduces the research area and provides a reference for reproducible literature-based evaluation design.

References

[1] Hong Y, Li Q, Yang Y, et al. Graph based encrypted malicious traffic detection with hybrid analysis of multi-view features. Information Sciences, 2023, 644: 119229.

[2] Hindy H, Brosset D, Bayne E, et al. A Taxonomy of Network Threats and the Effect of Current Datasets on Intrusion Detection Systems. IEEE Access, 2020, 8: 104650–104675. DOI: 10.1109/ACCESS.2020.3000179.

[3] Pacheco F, Exposito E, Gineste M, et al. Towards the Deployment of Machine Learning Solutions in Network Traffic Classification: A Systematic Survey. IEEE Communications Surveys & Tutorials, 2019, 21(2): 1988–2014. DOI: 10.1109/COMST.2018.2883147.

[4] Donkol A A E, Hafez A G, Hussein A I, et al. Optimization of intrusion detection using likely point PSO and enhanced LSTM-RNN hybrid technique in communication networks. IEEE Access, 2023, 11: 9469-9482.

[5] Abdelkhalek A, Mashaly M. Addressing the class imbalance problem in network intrusion detection systems using data resampling. Journal of Big Data, 2023, 10(1): 1-20.

[6] Anderson B, McGrew D. Machine Learning for Encrypted Malware Traffic Classification: Accounting for Noisy Labels and NonStationarity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017: 1723–1732.

[7] Erhan L, Ndubuaku M, Di Mauro M, et al. Smart anomaly detection in sensor systems: A multi-perspective review. Information Fusion, 2021, 67: 64-79.

[8] Bhuyan M H, Bhattacharyya D K, Kalita J K. Network Anomaly Detection: Methods, Systems and Tools. IEEE Communications Surveys & Tutorials, 2014, 16(1): 303-336.

[9] Karatas G, Demir O, Sahingoz O K. Increasing the performance of machine learning-based IDSs on an imbalanced and up-to-date dataset. Security and Communication Networks, 2020, 2020: 1–14.

[10] Milenkoski A, Vieira M, Kounev S, et al. Evaluating computer intrusion detection systems: A survey of common practices. ACM Computing Surveys, 2015, 48(1): 1-41.

[11] Mittal P. A comprehensive survey of deep learning-based lightweight object detection models for edge devices. Artificial Intelligence Review, 2024, 57(9): 242.

[12] Anderson B, McGrew D. Identifying Encrypted Malware Traffic with Contextual Flow Data. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security (ASIA CCS), 2016: 687–698.

[13] Yao H, Liu C, Zhang L, et al. Identification of Encrypted Traffic Through Attention Mechanism Based Long Short Term Memory. IEEE Transactions on Network and Service Management, 2022, 19(1): 507–519.

[14] Velan P, Cermak M, Celeda P, et al. A Survey of Methods for Encrypted Traffic Classification and Analysis. International Journal of Network Management, 2015, 25(5): 355–374. DOI: 10.1002/nem.1901.

[15] Faker O, Dogdu E. Intrusion Detection Using Big Data and Deep Learning Techniques. In Proceedings of the 2019 ACM Southeast Conference, 2019: 86–93.

[16] Qin L, Gu H, Wei W, et al. Spatio-temporal communication network traffic prediction method based on graph neural network. Information Sciences, 2024, 679: 121003.

[17] Aceto L, Ciuonzo D, Montieri A, Pescape A. Mobile Encrypted Traffic Classification Using Deep Learning: Experimental Evaluation, Lessons Learned, and Challenges. IEEE Transactions on Network and Service Management, 2019, 16: 445–458. DOI: 10.1109/TNSM.2019.2899078.

[18] Naghib A, Javidan R, Conti M. A Comprehensive and Systematic Literature Review on Intrusion Detection Systems in the Internet of Medical Things. ACM Computing Surveys, 2025, 57(1): 1-36.

[19] Kurniabudi A, Stiawan D, Bin Idris M Y, et al. CICIDS-2017 Dataset Feature Analysis With Information Gain for Anomaly Detection. IEEE Access, 2020, 8: 132911–132921.

[20] Mohammadpour L, Ling T C, Liew C S, et al. A Survey of CNN-Based Network Intrusion Detection. Applied Sciences, 2022, 12(16): 8162.

[21] Fu C, Li Q, Xu K. Flow interaction graph analysis: Unknown encrypted malicious traffic detection. IEEE/ACM Transactions on Networking, 2024, 32: 2972–2987. DOI: 10.1109/TNET.2024.3370851.

[22] Sajid M, Alshehri M D, Alghamdi R A. Enhancing Intrusion Detection: A Hybrid Machine and Deep Learning Approach. Computers & Security, 2024, 137: 103611.

[23] Binbusayyis A, Vaiyapuri T. Identifying and benchmarking key features for cyber intrusion detection: An ensemble approach. Information Sciences, 2019, 485: 452-463.

[24] Yin Y, Liu Y, Zhang T, et al. IGRF-RFE: A Hybrid Feature Selection Method for MLP-Based Network Intrusion Detection on UNSW-NB15 Dataset. Journal of Information Security and Applications, 2023, 73: 103414.

[25] Yang L, Shami A. A Transfer Learning and Optimized CNN Based Intrusion Detection System for Internet of Vehicles. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(12): 25447-25458.

[26] Watts L, Makhoul A, Perrot C, et al. A Dynamic Deep Reinforcement Learning-Bayesian Framework for Anomaly Detection. Computers & Security, 2022, 121: 102842.

[27] Buczak A L, Guven E. A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection. IEEE Communications Surveys & Tutorials, 2016, 18: 1153–1176. DOI: 10.1109/COMST.2015.2494502.

[28] Ferrag M A, Maglaras L, Moschoyiannis S, et al. Deep Learning for Cyber Security Intrusion Detection: Approaches, Datasets, and Comparative Study. Journal of Information Security and Applications, 2020, 50: 102419. DOI: 10.1016/j.jisa.2019.102419.

[29] Kumar P, Kumar G. Issues and Challenges of Intrusion Detection Systems: A Comprehensive Survey. International Journal of Computer Applications, 2015, 131: 1–12.

[30] Chen Z, Jiang F, Cheng Y, et al. XGBoost Classifier for DDoS Attack Detection and Analysis in SDNBased Cloud. In Proceedings of the IEEE International Conference on Big Data and Smart Computing (BigComp), 2018: 251–256. DOI: 10.1109/BigComp.2018.00044.

[31] Gamage S, Samarabandu J. Deep Learning Methods in Network Intrusion Detection: A Survey and an Objective Comparison. Journal of Network and Computer Applications, 2020, 169: 102767. DOI: 10.1016/j.jnca.2020.102767.

[32] Vinayakumar R, Soman K P, Poornachandran P, et al. Deep Learning Approach for Intelligent Intrusion Detection System. IEEE Access, 2019, 7: 41525–41550. DOI: 10.1109/ACCESS.2019.2895334.

[33] Albahar M A, Alazeb R S, Almazroi A M. An Improved Support Vector Machine for Intrusion Detection System. Computers, Materials & Continua, 2022, 70: 1207–1222. DOI: 10.32604/cmc.2022.019456.

[34] Alsaedi A, Moustafa N, Tari Z, et al. TON IoT Telemetry Dataset: A New Generation Dataset of IoT and IIoT for Data-Driven Intrusion Detection Systems. IEEE Access, 2020, 8: 165130–165150. DOI: 10.1109/ACCESS.2020.3022862.

[35] Breiman L. Random Forests. Machine Learning, 2001, 45: 5–32. DOI: 10.1023/A:1010933404324.

[36] Ahmad Z, Khan A S, Shiang C W, et al. Network Intrusion Detection System: A Systematic Study of Machine Learning and Deep Learning Approaches. Transactions on Emerging Telecommunications Technologies, 2021, 32: e4150. DOI: 10.1002/ett.4150.

[37] Xin Y, Kong L, Liu Z, et al. Machine Learning and Deep Learning Methods for Cybersecurity. IEEE Access, 2018, 6: 35365–35381. DOI: 10.1109/ACCESS.2018.2836950.

[38] Apruzzese G, Colajanni M, Ferretti L, et al. On the Effectiveness of Machine and Deep Learning for Cyber Security. In IEEE 10th International Conference on Cloud Networking (CloudNet), 2018: 1-8. DOI: 10.1109/CloudNet.2018.8549288.

[39] Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016: 785–794. DOI: 10.1145/2939672.2939785.

[40] Thakkar A, Lohiya R. A Review on Machine Learning and Deep Learning Perspectives of IDS for IoT: Recent Updates, Security Issues, and Challenges. Archives of Computational Methods in Engineering, 2021, 28: 3211–3243. DOI: 10.1007/s11831-020-09496-0.

[41] Sarker I H, Kayes A S M, Badsha S, et al. Cybersecurity Data Science: An Overview from Machine Learning Perspective. Journal of Big Data, 2020, 7: 41. DOI: 10.1186/s40537-020-00318-5.

[42] Mishra A K, Yadav V K, Shukla S K. A Comparative Analysis of Supervised Machine Learning Algorithms for Intrusion Detection. International Journal of Advanced Computer Science and Applications, 2020, 11: 415-422.

[43] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436–444. DOI: 10.1038/nature14539.

[44] Goodfellow I, Bengio Y, Courville A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016.

[45] Wang J, Liu Y, Li Y. 1D CNN-Based Network Intrusion Detection with Normalization on Imbalanced Data. In IEEE International Conference on Communications (ICC), 2020: 1-6. DOI: 10.1109/ICC40277.2020.9148865.

[46] Wang Z. The Applications of Deep Learning on Traffic Identification. In BlackHat USA, 2015.

[47] Hindy H, Bayne E, Bures M, et al. Machine Learning Based IoT Intrusion Detection System: An MQTT Case Study. In IEEE Irish Signals and Systems Conference (ISSC), 2020: 1-6. DOI: 10.1109/ISSC49989.2020.9180164.

[48] Zhao R, Yan R, Chen Z, et al. Deep Learning and Its Applications to Machine Health Monitoring. Mechanical Systems and Signal Processing, 2019, 115: 213-237. DOI: 10.1016/j.ymssp.2018.05.050.

[49] Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Computation, 1997, 9: 1735–1780. DOI: 10.1162/neco.1997.9.8.1735.

[50] Cho K, van Merrienboer B, Gulcehre C, et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014: 1724–1734. DOI: 10.3115/v1/D14-1179.

[51] Mirsky Y, Doitshman T, Elovici Y, et al. Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection. In Network and Distributed Systems Security (NDSS) Symposium, 2018. DOI: 10.14722/ndss.2018.23204.

[52] Anderson B, McGrew D. Identifying Encrypted Malware Traffic with Contextual Flow Data. In Proceedings of the 2016 ACM on Asia Conference on Computer and Communications Security, 2016: 35-46. DOI: 10.1145/2897845.2897890.

[53] Pascanu R, Mikolov T, Bengio Y. On the Difficulty of Training Recurrent Neural Networks. In International Conference on Machine Learning (ICML), 2013: 1310-1318.

[54] Hinton G E, Salakhutdinov R R. Reducing the Dimensionality of Data with Neural Networks. Science, 2006, 313: 504–507. DOI: 10.1126/science.1127647.

[55] Cortes C, Vapnik V. Support-Vector Networks. Machine Learning, 1995, 20: 273–297. DOI: 10.1007/BF00994018.

[56] Kingma D P, Welling M. Auto-Encoding Variational Bayes. In International Conference on Learning Representations (ICLR), 2014. Available online: https://arxiv.org/abs/1312.6114.

[57] Liu F T, Ting K M, Zhou Z. Isolation Forest. In IEEE International Conference on Data Mining, 2008: 413-422. DOI: 10.1109/ICDM.2008.17.

[58] Zhou C, Paffenroth R C. Anomaly Detection with Robust Deep Autoencoders. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017: 665–674. DOI: 10.1145/3097983.3098052.

[59] Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need. Advances in Neural Information Processing Systems (NeurIPS), 2017, 30.

[60] Lan Z, Chen M, Goodman S, et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In International Conference on Learning Representations (ICLR), 2020.

[61] Cho J H, Hariharan B. On the Efficacy of Knowledge Distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019: 4794-4802. DOI: 10.1109/ICCV.2019.00490.

[62] Lin Z, Feng M, Santos C N d, et al. A Structured Self-attentive Sentence Embedding. In International Conference on Learning Representations (ICLR), 2017. Available online: https://arxiv.org/abs/1703.03130.

[63] Zhou J, Cui G, Hu S, et al. Graph Neural Networks: A Review of Methods and Applications. AI Open, 2020, 1: 57-81. DOI: 10.1016/j.aiopen.2021.01.001.

[64] Pareja A, Domeniconi G, Chen J, et al. EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34: 5363-5370. DOI: 10.1609/aaai.v34i04.5984.

[65] Zhang C, Song D, Huang C, et al. Heterogeneous Graph Neural Network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019: 793-803. DOI: 10.1145/3292500.3330961.

[66] Hamilton W, Ying Z, Leskovec J. Inductive Representation Learning on Large Graphs. Advances in Neural Information Processing Systems (NeurIPS), 2017, 30.

[67] You J, Ying R, Leskovec J. Position-aware Graph Neural Networks. In International Conference on Machine Learning (ICML), 2019: 7134-7143.

[68] McMahan B, Moore E, Ramage D, et al. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 2017: 1273–1282.

[69] Caldas S, Duddu S M K, Wu P, et al. LEAF: A Benchmark for Federated Settings. In Workshop on Federated Learning for Data Privacy and Confidentiality (NeurIPS), 2019. Available online: https://arxiv.org/abs/1812.01097.

[70] Zhao Y, Li M, Lai L, et al. Federated Learning with Non-IID Data. In arXiv preprint arXiv:1806.00582, 2018. Available online: https://arxiv.org/abs/1806.00582.

[71] Bhagoji A N, Chakraborty S, Mittal P, et al. Analyzing Federated Learning through an Adversarial Lens. In International Conference on Machine Learning (ICML), 2019: 634–643.

[72] Sutton R S, Barto A G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018.

[73] Xiao L, Li Y, Han G, et al. PHY-Layer Spoofing Detection with Reinforcement Learning in Wireless Networks. IEEE Transactions on Vehicular Technology, 2016, 65: 10037–10047. DOI: 10.1109/TVT.2016.2524258.

[74] Liu Y, Chen Y, Shen C. A Deep Reinforcement Learning Based Approach for Network Intrusion Detection. Computers & Security, 2021, 108: 102314. DOI: 10.1016/j.cose.2021.102314.

[75] Paasch C, Bonaventure O. QUIC: Opportunities and threats in the evolution of the Internet. ACM SIGCOMM Computer Communication Review, 2021, 51: 42-48. Available online: https://doi.org/10.1145/3457175.3457183.

[76] Stolfo S J, Fan W, Lee W, et al. Cost-based modeling for fraud and intrusion detection: Results from the JAM project. In Proceedings of DARPA Information Survivability Conference and Exposition (DISCEX), 2000. Available online: https://doi.org/10.1109/DISCEX.2000.821515.

[77] Korczynski M, Duda A. Markov chain fingerprinting to classify encrypted traffic. In IEEE INFOCOM 2014: 781-789.

[78] Durumeric Z, Kasten J, Adrian D, et al. The Matter of Heartbleed. In Proceedings of the 2014 Internet Measurement Conference (IMC), 2014: 475–488.

[79] Wang X, Liu S, Zhang J, et al. Graph-based Malicious Traffic Detection with Hierarchical Attention. IEEE Transactions on Network and Service Management, 2023. Available online: https://doi.org/10.1109/TNSM.2023.3245678.

[80] Apruzzese G, Colajanni M, Ferretti L, et al. On the Cross-evaluation of Machine Learning-based Network Intrusion Detection Systems. IEEE Transactions on Network and Service Management, 2022, 19: 2482-2496. Available online: https://doi.org/10.1109/TNSM.2022.3162936.

[81] Sharafaldin I, Lashkari A H, Ghorbani A A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP), 2018.

[82] Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22: 1345-1359.

[83] Ganin Y, Ustinova E, Ajakan H, et al. Domain-adversarial training of neural networks. Journal of Machine Learning Research, 2016, 17: 2096-2030.

[84] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019: 4171–4186.

[85] Arjovsky M, Bottou L, Gulrajani I, et al. Invariant Risk Minimization. In arXiv preprint arXiv:1907.02893, 2019. Available online: https://arxiv.org/abs/1907.02893.

[86] Bengio Y, Courville A, Vincent P. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35: 1798-1828.

[87] Scholkopf B, Locatello F, Bauer S, et al. Toward Causal Representation Learning. Proceedings of the IEEE, 2021, 109: 612-634.

[88] Odena A, Olah C, Shlens J. Conditional Image Synthesis with Auxiliary Classifier GANs. In International Conference on Machine Learning (ICML), 2017: 2642–2651.

[89] Ribeiro M T, Singh S, Guestrin C. ‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016: 1135–1144.

[90] Sahoo D, Pham Q, Lu J, et al. Online Deep Learning: Learning Deep Neural Networks on the Fly. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), 2018: 2660-2666.

[91] Li Z, Hoiem D. Learning without Forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40: 2935-2947.

[92] Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014.

[93] Biggio B, Corona I, Maiorca D, et al. Evasion Attacks against Machine Learning at Test Time. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), 2013: 387–402.

[94] Carlini N, Wagner D. Towards Evaluating the Robustness of Neural Networks. In IEEE Symposium on Security and Privacy (SP), 2017: 39–57.

[95] Madry A, Makelov A, Schmidt L, et al. Towards Deep Learning Models Resistant to Adversarial Attacks. In International Conference on Learning Representations (ICLR), 2018.

[96] Cohen J, Rosenfeld E, Kolter Z. Certified Adversarial Robustness via Randomized Smoothing. In International Conference on Machine Learning (ICML), 2019: 1310-1320.

[97] Katz G, Barrett C, Dill D L, et al. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. In International Conference on Computer Aided Verification (CAV), 2017: 97–117.

[98] Metzen J H, Genewein T, Fischer V, et al. On Detecting Adversarial Perturbations. In International Conference on Learning Representations (ICLR), 2017.

[99] Liu X, Xie L, Wang Y, et al. Privacy and Security Issues in Deep Learning: A Survey. IEEE Access, 2021, 9: 4566-4593.

[100] Tsipras D, Santurkar S, Engstrom L, et al. Robustness May Be at Odds with Accuracy. In International Conference on Learning Representations (ICLR), 2019.

DEEP LEARNING-BASED MALICIOUS TRAFFIC ANALYSIS: A COMPREHENSIVE SURVEY

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

DOI:

How to Cite