Ανίχνευση phishing URLs με μοντέλα μηχανικής μάθησης

Κατσικαρέλης, Στέφανος

Master Thesis

Author

Κατσικαρέλης, Στέφανος

Date

2025

Abstract

This master's thesis focuses on the development and evaluation of machine learning models for detecting phishing URLs, an escalating issue in cybersecurity. The research aims to leverage features extracted from URL addresses to classify them as malicious or legitimate, utilizing a labeled dataset from the Mendeley repository. The methodology encompasses data preprocessing, the creation of quantitative features (e.g., URL length, presence of HTTPS), and the application of four models: RandomForest, LogisticRegression, XGBoost, and LightGBM. Evaluation was performed using metrics such as accuracy, precision, recall, and F1-score, along with cross validation, revealing accuracy exceeding 95%. The results indicate that XGBoost outperforms in phishing detection (recall ~90%), confirming the significance of features like URL length. Despite limitations, including data imbalance and an exclusive focus on URLs, the work provides a reliable approach for bolstering cybersecurity. Future enhancements are suggested, such as oversampling techniques (SMOTE) and the integration of neural networks, to achieve greater effectiveness.

Postgraduate Studies Programme

Προηγμένα Συστήματα Πληροφορικής

Department

Σχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Ψηφιακών Συστημάτων

Number of pages

Language

Greek

URI

https://dione.lib.unipi.gr/xmlui/handle/unipi/18652

Collections

Τμήμα Ψηφιακών Συστημάτων

Show full item record

Except where otherwise noted, this item's license is described as
Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα