Πρόβλεψη διαβήτη χρησιμοποιώντας data mining αλγορίθμους και ανάλυση με Python

View/ Open
Keywords
Data mining ; Python ; Πρόβλεψη διαβήτη ; Ανάλυση με PythonAbstract
This diploma thesis examines the use of machine learning methods to predict the occurrence
of diabetes. Using a large set of health data, a series of classification algorithms were trained
in order to identify the model that could achieve the best prediction. Algorithms such as
Logistic Regression, Random Forest, SVM, XGBoost and LightGBM were tested. The
algorithms were evaluated on the basis of standard metrics such as accuracy, accuracy,
recall, and ROC curve. XGBoost and LightGBM showed the highest performance with 97%
accuracy. While the use of SMOTE techniques proved quite effective in improving the
predictive ability of the trained models overall. Finally, the development of an API that
enables automatic diagnosis based on patient case input data indicates the immediate
applicability and scalability of the study, as well as how it could become part of broader
real-world information applications in the field of clinical practice.