Parallel processing of spatio-textual similarity join query
Master Thesis
Author
Παυλόπουλος, Νικόλαος
Pavlopoulos, Nikolaos
Date
2019-04-01Advisor
Δουλκερίδης, ΧρήστοςView/ Open
Keywords
Parallel processing ; Spatio-textual query ; Spark ; Similarity ; Join queryAbstract
With the rapid development of mobile Internet technology, Internet users are shifting from desktop to mobile devices. Modern mobile devices (e.g., smartphones and tablets) are equipped with GPS, which can help users to easily obtain their locations, and location-based services (LBS) have been widely deployed. Users that consume location-based services are generating more and more spatio-textual data, which contains both textual descriptions and geographical locations.
A spatio-textual similarity join is an important operation in spatio-textual data integration, which, given two sets of spatio-textual objects, finds all similar pairs from the two sets, where the similarity can be quantified by combining spatial proximity and textual relevancy. As a simpler example of spatio-textual query, a user that wants to find Points of Interest (POIs) (i.e., hotels, restaurants), gives a position, a radius of search and some keywords which describe the POI.
In contrast, the Spatio-Textual Similarity Join (STSJ) query returns all objects which are close enough based on the radius and have high textual relevance. The main problem for this operation is when the two datasets have large volumes and centralized computation is not feasible or even practicable. Therefore, the necessity of scalable processing of large volumes of datasets, motivates to use big data technologies in order to parallelize the computation and achieve scalability.