AI adversarial attack detection and mitigation for AI-based systems

Master Thesis
Author
Ziras, Georgios
Ζήρας, Γεώργιος
Date
2025-04-04Advisor
Xenakis, ChristosΞενάκης, Χρήστος
View/ Open
Keywords
Artificial intelligence ; Adversarial attacks ; AI-based systems ; Mitigation of adversarial attacks ; Detection of adversarial attacksAbstract
The increasing integration of Artificial Intelligence (AI) systems into critical infrastructure such as cybersecurity, healthcare, and finance has introduced significant challenges in ensuring model robustness against adversarial attacks. This thesis investigates the susceptibility of various machine learning (ML) models to adversarial manipulations and explores effective detection and mitigation strategies to enhance resilience. Using the CIC-IDS2017 dataset, a suite of ML classifiers—including Decision Tree, Random Forest, Logistic Regression, XGBoost, and a custom PyTorch-based neural network—were trained and subjected to a range of adversarial evasion attacks including FGSM, PGD, DeepFool, and Carlini-Wagner.
A key focus of this research is the evaluation of both direct and transfer adversarial attacks, revealing that while traditional models suffered severe performance degradation, deep learning models exhibited stronger resilience. To improve robustness, adversarial training was employed, significantly enhancing model accuracy under attack, particularly for the PyTorch model, which retained over 98% accuracy in most cases.
Furthermore, this study integrates advanced detection mechanisms using the Adversarial Robustness Toolbox (ART), including Binary Input and Binary Activation Detectors. These detectors demonstrated high recall and precision in identifying adversarial inputs, although moderate performance on clean samples suggests a trade-off between security and usability. The implementation of a dual-layer detection pipeline within a machine learning system illustrates a practical defense-in-depth approach, capable of blocking or flagging adversarial inputs before reaching the core classifier.
This research contributes a comprehensive analysis of adversarial attack resilience in intrusion detection systems and proposes a scalable architecture for integrating robust training and real-time adversarial detection. Future work will focus on enhancing detection precision for clean samples, incorporating more diverse datasets, and exploring adaptive defenses to counter evolving attack strategies.