Αλγόριθμοι μηχανικής μάθησης σε ανομοιογενή δεδομένα: πρόβλεψη της HIV λοίμωξης σε χρήστες ενδοφλέβιων ναρκωτικών της Αθήνας
View/ Open
Keywords
Μη ισορροπημένα δεδομένα ; Αλγόριθμοι μηχανικής μάθησης ; HIV ; Χρήστες ενδοφλέβιων ναρκωτικώνAbstract
Machine learning is going through a period of continuous development. In recent years, more and more, machine learning techniques are being used in medicine for numerous diseases including infectious diseases, such as HIV infection. At the beginning of 2011, there was an HIV outbreak in people who inject drugs (PWID) in the metropolitan area of Athens. The University of Athens, in collaboration with the Organization Against Drugs, implemented the ARISTOTLE program with the aim of both testing and linking to HIV care. The aim of this thesis is to find the best classifier for HIV infection in PWID.
Data from the ARISTOTLE program was used and concerned 3320 unique PWID. The data included information on demographic characteristics, substance use, sexual behavior, and information about harm reduction programs (opioid substitution therapy, free syringes, etc.). Five classification algorithms (Logistic Regression, Random Forest, Support Vector Machines, k-Nearest Neighbors, and Decision Tree) were used to the data: 1) without resampling; 2) by random undersampling; 3) by random oversampling; 4) by synthetic minority oversampling technique and 5) by adaptive synthetic sampling method. These cases were applied to all features, after feature selection and after principal components analysis.
The Random Forest algorithm performed best when random oversampling was used. Sensitivity, accuracy, and AUC score were 0.9929, 0.9805 and 0.9967, respectively. Selecting 34 of the 112 characteristics, the sensitivity, accuracy, and AUC score were 0.9929, 0.9751 and 0.9967, respectively.
In conclusion, the status of HIV infection in the sample of PWID in Athens was correctly predicted at high rates, making algorithms an additional tool for early diagnosis in HIV cases, in order to avoid a new HIV outbreak.