Credit risk analysis using machine learning methods and explainable AI
Ανάλυση πιστωτικού κινδύνου με χρήση μεθόδων μηχανικής μάθησης και εξηγήσιμης τεχνητής νοημοσύνης
Master Thesis
Author
Sopileidi, Ailina
Σωπηλείδη, Αιλήνα
Date
2024-07View/ Open
Keywords
Probability of default ; Credit risk ; Machine Learning Algorithms ; Explainable AI ; Credit scoring ; Customer segmentation ; Clustering ; Decision tree ; Peer-to-peer lending ; ScorecardAbstract
This thesis focuses on the determinants of loan defaults of peer-to-peer lending platforms and
analyzes whether the dependent variable “default” can be predicted. The datasets used for this
research are 211.283 Bondora platform loans and 38 variables. The time period used is from
February 2009 until October 2022 and only completed (paid off or defaulted) loans are
considered. P2P lending is the act of lending money to individuals or small and mid-size
enterprises via online platforms that connects lenders and borrowers. One of the hot topics in
this field is risk assessment of applicants. A P2P lending company, in order to make sure the
client will be able to pay back the loan in agreed duration, assesses the risk of each applicant
individually.
This will be done using a decision tree model to measure the default probability of
loans.The main findings of this research are that age,PrincipalBalance,interest rate, loan
duration, MonthlyPayment and Debt to Income are positively related to the probability of default.
In contrast, the borrowers’ income, number of previous loans and value of previous loans have
a negative correlation with default risk.
In addition, a decision tree model and scorecard is performed to predict the probability of loan
default. The predictive ability of the decision tree model is examined, through comparison with
other machine learning models as Neural Networks, Random Forest & Decision Trees.
Based on the predictive measures, the misclassification rate, the accuracy and the ROC curve,
it appears that the Decision Tree model (Chi-Square) has relatively good predictive ability.
Additionally, we examined the ability of the customers in repaying credit loans by classifying the
loan receivers as ‘high risk or ‘low risk’ using scorecard in SAS.
The term ‘low risk’ states that the loan receiver has an acceptable score and there has been no
problematic payment records. On the other hand, the phrase ‘high risk’ suggests the opposite,
that the applicant has a bad credit score or there were records for delayed payments or past
defaults.
Scorecards are essential in the lending process as they quantify the risk associated with
loan applicants. They transform various borrower attributes, such as credit history, income
levels, and employment status, into a numerical score.
This score indicates the likelihood of a borrower defaulting on a loan, helping lenders make
informed decisions that balance risk and reward.