Συγκριτική μελέτη μεθοδολογιών μηχανικής μάθησης για την πρόγνωση της έκβασης δανείων
Comparative study of machine learning approaches for the loan outcome prediction problem

View/ Open
Keywords
Credit risk prediction ; Logistic regression ; Random forest ; XGBoost ; LightGBM ; Neural network ; Πρόβλεψη πιστωτικού κινδύνου ; Δάνεια ; Loans ; Μηχανική μάθηση ; Λογιστική παλινδρόμηση ; Αξιολόγηση πιστοληπτικής ικανότηταςAbstract
Credit risk prediction is essential for financial decision-making, allowing institutions to assess the likelihood of default before offering loans. In order to predict loan outcomes using the publicly available Kaggle "Credit Risk Dataset," we compare five popular machine learning techniques: logistic regression, Random Forest, xgboost, lightGBM, and a neural network (multilayer perceptron), with the dataset requiring extensive preprocessing, including handling missing values, encoding categorical variables, and normalizing input features, because it contains a variety of financial and demographic features.
With an emphasis on handling class imbalance, our goal is to evaluate the advantages and disadvantages of each method using important classification metrics such as accuracy, F1-score, precision-recall, and AUC, with ensemble techniques such as Random Forest and boosting algorithms such as XGBoost and LightGBM seeking to capture the complex interactions of features, and logistic regression serving as a baseline. We also investigate how neural networks may be able to generalize with complex data.
Largely due to the effective handling of class imbalance and feature importance, the experimental results show that ensemble boosting models, especially LightGBM, achieve the best balance between accuracy and recall, outperforming other models in F1-score and AUC, while gradient boosting methods provide a powerful method for tabular credit risk data and should be carefully studied in real-world credit scoring systems.


