Εφαρμογή τεχνικών εξόρυξης γνώσης σε οικονομικά δεδομένα – Πλεονεκτήματα και μειονεκτήματα σε μια τράπεζα και στις πιστωτικές με χρήση Python

View/ Open
Keywords
Big data ; Machine learning ; Data miningAbstract
This thesis examines the application of knowledge discovery and machine learning techniques to financial data, with a particular focus on the banking sector and credit risk management. Its main objective is to leverage large datasets in order to extract useful knowledge that can support improved decision-making strategies, while also enhancing risk management and the performance of banking products.
For the implementation, two publicly available datasets were used: the Bank Marketing Dataset, which concerns customer targeting for banking products through marketing campaigns, and the Default of Credit Card Clients Dataset, which is related to the prediction of payment defaults by credit card customers. The data were preprocessed using transformation techniques such as One-Hot Encoding for categorical variables and normalization for numerical variables.
Data imbalance was addressed using the SMOTE (Synthetic Minority Oversampling Technique) and Random Undersampling methods, in order to improve the classifiers’ accuracy on underrepresented classes. For model development, supervised learning algorithms were employed, including Random Forest, Logistic Regression, and Gradient Boosting. The results were evaluated based on metrics such as Accuracy, Precision, Recall, F1-Score, and ROC-AUC.
In the Bank Marketing dataset, Random Forest achieved a ROC-AUC of 0.92 with high stability, while Gradient Boosting reached approximately 0.91. In the credit card dataset, the best performances were once again achieved by Random Forest and Gradient Boosting, highlighting the flexibility and effectiveness of these algorithms in handling complex financial data.
The thesis concludes that the combined use of preprocessing techniques and advanced machine learning models offers significant potential for improving predictive accuracy in real-world banking applications. In addition, suggestions are provided for further research involving real banking data and the study of more advanced deep learning methods.

