Advanced deep learning techniques for credit risk prediction
Προηγμένες τεχνικές βαθιάς μάθησης για πρόβλεψη πιστωτικού κινδύνου
Master Thesis
Author
Solanakis, Spyridon
Σολανάκης, Σπυρίδων
Date
2025-02Keywords
Παρεμβολή ; Τεχνητά νευρωνικά δίκτυα ; Βαθιά μάθηση ; Δίκτυα Kolmogorov-Arnold ; Αξιολόγηση πιστοληπτικής κανότητας ; Regression ; Artificial neural networks ; Deep learning ; Kolmogorov-Arnold networks ; Credit scoringAbstract
This thesis extends the work presented in "Evolving Transparent Credit Risk Models: A Symbolic
Regression Approach Using Genetic Programming" (Sotiropoulos et al. 2024), focusing on improving
credit risk prediction accuracy through advanced modeling techniques. The original study proposed a
layered regression framework to model normalized FICO scores, discretized into 20 bins representing
distinct levels of credit risk. While the framework effectively addressed the regression task using various
machine learning models, challenges were identified in the higher-index layers due to increased data
complexity and class overlap.
Building upon these findings, this research introduces Kolmogorov-Arnold Networks (KANs) as an
interpretable alternative to black-box models. The study also explores advanced feature engineering
and iterative dataset refinement methods to address issues of data misclassification and variability in
higher-index layers. Key innovations include the reassignment of data points to FICO bins based on
proximity to centroids, the redistribution of data points across layers to ensure balanced representation,
and the application of mean FICO values from the nearest k-bin centroids. These methodologies yielded
significant improvements in prediction accuracy, achieving near-perfect values in some
configurations, particularly for datasets with extended feature sets.
The results support the hypothesis that certain data points were initially misclassified into incorrect
bins or that additional latent features not captured in the dataset influence bin assignments. Ultimately,
the findings suggest that refining FICO bin assignments and improving data distribution across layers are
critical steps toward enhancing model performance. Furthermore, the integration of additional features
or domain knowledge has the potential to further optimize credit risk predictions and provide a stronger
foundation for future advancements in the field.