Διαχείριση χωρο-κειμενικών δεδομένων μεγάλης κλίμακας
View/ Open
Keywords
Apache Spark ; Spatio-textual join ; Spatial join ; Web-based platformAbstract
In this thesis attempt was made to deal with the problem of joining a great amount of spatio-textual data. Data consist of records each one of which consists of the coordinates (lon, lan) and the text. Every record is an object x. There are two criteria that define the similarity between two objects, the distance and the text similarity.
In order for the problem mentioned above to be solved, an algorithm was created. This algorithm was put into practice in distributed computer environment, using the Apache Spark tool and taking advantage of the power and the data processing abilities of every computer processor. The algorithm was also used with the aim of taking advantage of the graphics processor abilities which was managed with the use of Nvidia tools.
Finally, an online platform was created for the execution of spatio-textual join algorithms in Apache Spark. In this platform each user is able to upload an algorithm that solves the problem, execute it, and even visualize the results in a chart. Moreover, it offers the ability to visualize the initial dataset that the algorithm accepts as an input.