Advanced deep learning techniques for credit risk prediction

Solanakis, Spyridon; Σολανάκης, Σπυρίδων

Προηγμένες τεχνικές βαθιάς μάθησης για πρόβλεψη πιστωτικού κινδύνου

Master Thesis

Author

Solanakis, Spyridon

Σολανάκης, Σπυρίδων

Date

2025-02

Advisor

Sotiropoulos, Dionisios
Σωτηρόπουλος, Διονύσιος

Keywords

Παρεμβολή ; Τεχνητά νευρωνικά δίκτυα ; Βαθιά μάθηση ; Δίκτυα Kolmogorov-Arnold ; Αξιολόγηση πιστοληπτικής κανότητας ; Regression ; Artificial neural networks ; Deep learning ; Kolmogorov-Arnold networks ; Credit scoring

Abstract

This thesis extends the work presented in "Evolving Transparent Credit Risk Models: A Symbolic Regression Approach Using Genetic Programming" (Sotiropoulos et al. 2024), focusing on improving credit risk prediction accuracy through advanced modeling techniques. The original study proposed a layered regression framework to model normalized FICO scores, discretized into 20 bins representing distinct levels of credit risk. While the framework effectively addressed the regression task using various machine learning models, challenges were identified in the higher-index layers due to increased data complexity and class overlap. Building upon these findings, this research introduces Kolmogorov-Arnold Networks (KANs) as an interpretable alternative to black-box models. The study also explores advanced feature engineering and iterative dataset refinement methods to address issues of data misclassification and variability in higher-index layers. Key innovations include the reassignment of data points to FICO bins based on proximity to centroids, the redistribution of data points across layers to ensure balanced representation, and the application of mean FICO values from the nearest k-bin centroids. These methodologies yielded significant improvements in prediction accuracy, achieving near-perfect values in some configurations, particularly for datasets with extended feature sets. The results support the hypothesis that certain data points were initially misclassified into incorrect bins or that additional latent features not captured in the dataset influence bin assignments. Ultimately, the findings suggest that refining FICO bin assignments and improving data distribution across layers are critical steps toward enhancing model performance. Furthermore, the integration of additional features or domain knowledge has the potential to further optimize credit risk predictions and provide a stronger foundation for future advancements in the field.