AI adversarial attack detection and mitigation for AI-based systems

Ziras, Georgios; Ζήρας, Γεώργιος

Master Thesis

Author

Ziras, Georgios

Ζήρας, Γεώργιος

Date

2025-04-04

Abstract

The increasing integration of Artificial Intelligence (AI) systems into critical infrastructure such as cybersecurity, healthcare, and finance has introduced significant challenges in ensuring model robustness against adversarial attacks. This thesis investigates the susceptibility of various machine learning (ML) models to adversarial manipulations and explores effective detection and mitigation strategies to enhance resilience. Using the CIC-IDS2017 dataset, a suite of ML classifiers—including Decision Tree, Random Forest, Logistic Regression, XGBoost, and a custom PyTorch-based neural network—were trained and subjected to a range of adversarial evasion attacks including FGSM, PGD, DeepFool, and Carlini-Wagner. A key focus of this research is the evaluation of both direct and transfer adversarial attacks, revealing that while traditional models suffered severe performance degradation, deep learning models exhibited stronger resilience. To improve robustness, adversarial training was employed, significantly enhancing model accuracy under attack, particularly for the PyTorch model, which retained over 98% accuracy in most cases. Furthermore, this study integrates advanced detection mechanisms using the Adversarial Robustness Toolbox (ART), including Binary Input and Binary Activation Detectors. These detectors demonstrated high recall and precision in identifying adversarial inputs, although moderate performance on clean samples suggests a trade-off between security and usability. The implementation of a dual-layer detection pipeline within a machine learning system illustrates a practical defense-in-depth approach, capable of blocking or flagging adversarial inputs before reaching the core classifier. This research contributes a comprehensive analysis of adversarial attack resilience in intrusion detection systems and proposes a scalable architecture for integrating robust training and real-time adversarial detection. Future work will focus on enhancing detection precision for clean samples, incorporating more diverse datasets, and exploring adaptive defenses to counter evolving attack strategies.

Postgraduate Studies Programme

Ασφάλεια Ψηφιακών Συστημάτων

Department

Σχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Ψηφιακών Συστημάτων

Number of pages

Language

English

URI

https://dione.lib.unipi.gr/xmlui/handle/unipi/17622

Collections

Τμήμα Ψηφιακών Συστημάτων

Show full item record

Except where otherwise noted, this item's license is described as
Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα