Μέθοδοι επιβλεπόμενης μηχανικής μάθησης για την εκτίμηση της πιθανότητας αθέτησης
Supervised machine learning methods for estimating probability of default
Master Thesis
Author
Kourouklis, Christos
Κουρούκλης, Χρήστος
Date
2024-09Advisor
Bersimis, SotirisΜπερσίμης, Σωτήριος
View/ Open
Keywords
Data science ; Στατιστική ; Statistics ; Μηχανική μάθηση ; Machine learning ; Probability of default ; PD ; Credit risk ; Πιστωτικός κίνδυνος ; Statistical modeling ; Logistic regression ; Random forest ; Decision trees ; Gradient boosting ; Neural networks ; Cross validation ; Feature selection ; Information value ; IV ; Gini ; Hyperparameter tuning ; Λογιστική παλινδρόμηση ; Τυχαία δάση ; Δένδρα αποφάσεων ; Population Stability Index ; PSI ; Εφαρμοσμένη στατιστική ; Supervised machine learning ; Επιβλεπόμενη μηχανική μάθησηAbstract
This thesis focuses on the estimation of Probability of Default using binary supervised learning methods, which is essential for effective risk management and regulatory compliance in financial institutions and the banking sector. The crucial role of Probability of Default estimation is emphasized, discussing the importance of financial institutions, the challenges they face, and the significance of accurate Probability of Default estimation in mitigating credit risk and ensuring regulatory compliance, by also providng a concise literature review on credit risk and Probability of Default estimation, from traditional statistical methods to more recent advancements in machine learning. The theoretical background of Logistic Regression, Random Forest, Gradient Boosting, and Neural Networks, including data management, evaluation metrics, and model training processes are also being described. In the last part of the current thesis, these methods are practically implemented using real-world credit data, with performance comparisons made using Kolmogorov-Smironv, Gini, and Area Under the Curve metrics (i.e. banking industry standards). The thesis aims to enhance credit risk assessment practices through the application, comparison, evaluation, and optimization of these statistical machine learning models.