Επεξεργασία χωρο-κειμενικών συζεύξεων για δεδομένα μεγάλης κλίμακας
Large-scale processing of spatio-textual joins

View/ Open
Keywords
Spatio-textual ; Spatial ; Textual ; Joins ; Συζεύξεις ; Κατανεμημένα ; Partitioning ; Tokens ; Load ; Balancing ; ΕξισορρόπησηAbstract
In our era, it is very popular to search for objects that can be found both between a user-defined distance and having a similarity in their textual features. Big Data are everywhere and are a challenge for the scientific community. It is important to have fast processes that can manage big data sets, so it is of most significance to apply the process regarding the aforementioned search on systems that support distributed processes. Ιs is also useful to implement efficient methods that reduce the time cost.
For this Thesis, we worked on a centralized environment, we investigated methods of spatio-textual data distribution and processing and we performed a simulation that can be used as a guide for implementation in distributed frameworks. At first, we look into distributing data based on their textual part, but we also worked on distributing data based on their spatial features. For the first method we took advantage of the tokens frequency across the data set. For the second method we partitioned space in zones taking advantage of a data sample and percentiles of their coordinates values Both methods can achieve balanced distribution and each one of them has its preferable advantages.