EVALUATING THE PERFORMANCE OF MACHINE LEARNING MODELS IN WEB ATTACK DETECTION

Authors

  • Le Anh Tuan Ho Chi Minh City University of Industry and Trade Corresponding Author

DOI:

https://doi.org/10.62985/j.huit_ojs.vol26.no1E.373

Keywords:

Machine learning, detection system, web attacks.

Abstract

The rapid proliferation of web-based systems has been accompanied by a growing risk of sophisticated cyberattacks, including SQL injection (SQLi), cross-site scripting (XSS), and cross-site request forgery (CSRF). Traditional defense mechanisms have shown limitations in detecting emerging attack variants due to their lack of adaptability. This study proposes a machine learning–based approach for web attack detection, utilizing the HTTP CSIC 2010 dataset as the experimental foundation. The dataset is preprocessed and feature extraction is performed on HTTP requests to construct meaningful inputs for classification models. Multiple machine learning algorithms, including Random Forest (RF) and Support Vector Machine (SVM), are employed to identify anomalous behaviors. Experimental results demonstrate that the Random Forest model achieves the best performance, with an accuracy of 96.03%, an F1-score of 96.00%, and a ROC-AUC of 0.995. These findings indicate that machine learning–based approaches can significantly enhance the detection of malicious HTTP requests in modern web environments.

References

[1] A. Salam, F. Ullah, F. Amin, and M. Abrar, “Deep learning techniques for web-based attack detection in industry 5.0: A novel approach,” Technologies, vol. 11, no. 4, p. 107, 2023, doi: https://doi.org/10.3390/technologies11040107

[2] P. V. Hau and D. T. T. Hien, “Enhancing Web Application Security: A Deep Learning and NLP-based Approach for Accurate Attack Detection,” doi: https://doi.org/10.54654/isj.v3i20.1008

[3] M. Al Lail, A. Garcia, and S. Olivo, “Machine learning for network intrusion detection—a comparative study,” Future Internet, vol. 15, no. 7, p. 243, 2023, doi: https://doi.org/10.3390/fi15070243

[4] J. Wen, S. Li, Z. Lin, Y. Hu, and C. Huang, “Systematic literature review of machine learning based software development effort estimation models,” Information and software technology, vol. 54, no. 1, pp. 41-59, 2012, doi: https://doi.org/10.1016/j.infsof.2011.09.002

[5] C. T. Giménez, A. P. Villegas, and G. Á. Marañón, “HTTP data set CSIC 2010,” Information Security Institute of CSIC (Spanish Research National Council), 2010, doi: https://impactcybertrust.org/dataset_view?idDataset=940

[6] J. C. Eunaicy and S. Suguna, “Web attack detection using deep learning models,” Materials Today: Proceedings, vol. 62, pp. 4806-4813, 2022, doi: https://doi.org/10.1016/j.matpr.2022.03.348

[7] M. K. Baklizi, I. Atoum, M. Alkhazaleh, H. Kanaker, N. Abdullah, O. A. Al-Wesabi, and A. A. Otoom, “Web Attack Intrusion Detection System Using Machine Learning Techniques,” International Journal of Online & Biomedical Engineering, vol. 20, no. 3, 2024, doi: https://doi.org/10.3991/ijoe.v20i03.45249

[8] Y. Pan, F. Sun, Z. Teng, J. White, D. C. Schmidt, J. Staples, and L. Krause, “Detecting web attacks with end-to-end deep learning,” Journal of Internet Services and Applications, vol. 10, no. 1, pp. 1-22, 2019, doi: https://doi.org/10.1186/s13174-019-0115-x

[9] M. Alghawazi, D. Alghazzawi, and S. Alarifi, “Detection of sql injection attack using machine learning techniques: a systematic literature review,” Journal of Cybersecurity and Privacy, vol. 2, no. 4, pp. 764-777, 2022, doi: https://doi.org/10.3390/jcp2040039

[10] L. Tang and Q. H. Mahmoud, “A survey of machine learning-based solutions for phishing website detection,” Machine Learning and Knowledge Extraction, vol. 3, no. 3, pp. 672-694, 2021, doi: https://doi.org/10.3390/make3030034

[11] V.-H. Pham, H. K. Nghi, and H. Q. Nguyen, “Deception and Continuous Training Approach for Web Attack Detection using Cyber Traps and MLOps,” VNUHCM Journal of Science and Technology Development, vol. 26, no. 2, pp. 2729-2740, 2023, doi: https://doi.org/10.32508/stdj.v26i2.4044

[12] C. Do Xuan and N. M. Son, “Enhancing web attack detection efficiency based on natural language processing techniques,” Journal of Computer Science and Cybernetics, vol. 42, no. 1, pp. 73-87, 2026, doi: https://doi.org/10.15625/1813-9663/23407

[13] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5-32, 2001, doi: https://doi.org/10.1023/A:1010933404324

[14] T. Hastie, R. Tibshirani, and J. Friedman, “The elements of statistical learning,” ed: Springer series in statistics New-York, 2009, doi: https://doi.org/10.1007/978-0-387-84858-7

[15] K. P. Murphy, Machine learning: A Probabilistic Perspective. MIT press, 2012, doi: https://mitpress.mit.edu/9780262018029/machine-learning-a-probabilistic-perspective/

[16] T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE transactions on information theory, vol. 13, no. 1, pp. 21-27, 1967,

doi: https://doi.org/10.1109/TIT.1967.1053964

[17] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273-297, 1995, doi: https://doi.org/10.1007/BF00994018.

[18] T. Chen and C. Guestrin, “Xgboost: A Scalable Tree Boosting System,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785-794. doi: https://doi.org/10.1145/2939672.2939785

[19] R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Proceedings of the IJCAI, 1995, vol. 14, no. 2: Montreal, Canada, pp. 1137-1145. doi: https://www.ijcai.org/Proceedings/95-2/Papers/016.pdf

[20] J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” Journal of machine learning research, vol. 13, no. 2, 2012, doi: https://www.jmlr.org/papers/v13/bergstra12a.html.

Published

2026-05-27

Issue

Section

Information Technology