Ανάλυση δεδομένων COVID-19 με Συστήματα Γεωγραφικών Πληροφοριών
View/ Open
Keywords
Μεγάλα δεδομένα ; COVID-19 ; Συστήματα Γεωγραφικών ΠληροφοριώνAbstract
In this research we will study the geographical spread of the coronavirus. Τhe coronavirus source of data are the cases of infections and deaths per day. Data that currently contains about 60,000 records have been used, making them suitable for the application of big data analysis techniques. Spatial data were also studied, such as some basic geographical and economic data by country (type of population, density, GDP, etc.) and demographic data (population, density, age, etc.) for 2020.
In addition, spatial data of different types are analyzed, such as the weather (e.g., temperature, rain). The meteorological data is for 2020. The data will be combined and the characteristics will be classified. The variable goal will be the daily increase of the coronavirus. We will divide the increase into distinct categories.
Next, we are going to train various algorithms, for example K-nearest-neighbors, SVM (support vector machines), Decision Tree and Random Forest. There will be 10-fold cross-validation, so that we can divide the data into training and test subsets. Furthermore, we will test various parameters to optimize the results of each algorithm. Based on the test data we will evaluate the classifiers with various metrics such as accuracy, sensitivity and expertise to see which algorithm gives the best result.
The geographical data that was collected before for the coronavirus, will be displayed on a map. The correlation of economic, demographic and meteorological data with the spread of the virus can also be mapped. Taking into account the nature of the data, the analysis and display will be done by country.