Data analysis and prediction algorithms with Python

Master Thesis
Author
Voulgari, Evangelia
Βούλγαρη, Ευαγγελία
Date
2025-02View/ Open
Keywords
Data analysis ; Prediction algorithms ; Python ; Healthcare data ; Machine learningAbstract
With the rapid increase of data, the field of data analysis also developed, especially in healthcare. Heart
disease is one of the leading causes of death worldwide, making early detection a crucial matter.
Traditional diagnosing methods for heart disease often require expensive and time-consuming medical
test, which may not always correspond to accurate results. On the other hand, machine learning offers a
faster and more efficient way to analyze patient data and predict heart disease risk.
This thesis explores the application of Python-Based data analysis and machine learning algorithms for
heart disease prediction. The dataset that will be used for this, contains key medical attributes. In this
dataset there going to be applied preprocessing techniques such as data cleaning, normalization and
feature encoding to ensure that the data was ready for further process. Exploratory Data Analysis (EDA)
will be applied to identify patterns and correlations within the features. The study implements and
compares two machine learning models, the Random Forest Classifier, that is a supervised learning
approach and K-Means Clustering that is an unsupervised learning method. Model performance will be
evaluated with the use of key metrics, including accuracy, precision, recall and F1-score.
Findings from this research indicate that machine learning models, particularly the Random Forest
Classifier, can effectively predict heart disease with high accuracy. Feature importance analysis identified
key risk factors such as age, cholesterol levels, blood pressure, and chest pain type, reinforcing their
significance in heart disease diagnosis. The study highlights the potential of integrating machine learning
models into healthcare systems to facilitate early detection, reduce costs, and enhance patient outcomes.
Future research could explore deep learning techniques, real-time patient monitoring with wearable
technology and the integration of predictive models into clinical decision support systems. This thesis
contributes to the research on using Artificial Intelligence (AI) in healthcare, showing how data analysis
can help improve heart disease diagnosis.