Ανίχνευση phishing URLs με μοντέλα μηχανικής μάθησης

View/ Open
Keywords
Machine learning ; Cybersecurity ; Phishing detectionAbstract
This master's thesis focuses on the development and evaluation of machine
learning models for detecting phishing URLs, an escalating issue in
cybersecurity. The research aims to leverage features extracted from URL
addresses to classify them as malicious or legitimate, utilizing a labeled dataset
from the Mendeley repository. The methodology encompasses data
preprocessing, the creation of quantitative features (e.g., URL length, presence
of HTTPS), and the application of four models: RandomForest,
LogisticRegression, XGBoost, and LightGBM. Evaluation was performed using
metrics such as accuracy, precision, recall, and F1-score, along with cross validation, revealing accuracy exceeding 95%. The results indicate that XGBoost
outperforms in phishing detection (recall ~90%), confirming the significance of
features like URL length. Despite limitations, including data imbalance and an
exclusive focus on URLs, the work provides a reliable approach for bolstering
cybersecurity. Future enhancements are suggested, such as oversampling
techniques (SMOTE) and the integration of neural networks, to achieve greater
effectiveness.


