Grid based hybrid search for spatio-textual data

Σασάτη, Ηλίας; Sasati, Ilias

Master Thesis

Συγγραφέας

Σασάτη, Ηλίας

Sasati, Ilias

Ημερομηνία

2025-06

Περίληψη

In this thesis, we present a new approach to the approximate similarity search problem over spatio-textual data, where queries involve both geographic locations and semantically rich text. Unlike traditional approaches that rely on exact keyword matching, our method leverages semantic vector representation (word embeddings) that captures the underlying meaning of textual content. We have developed an algorithm that efficiently processes spatio-textual queries in high dimensional vector space, combining spatial proximity with semantic similarity. Building a combined index is challenging, especially when one needs to prioritize either textual or spatial relevance. Traditionally, such combined indexes rely on keyword search, which limits contextual understanding. But what if we wanted to answer semantic queries such as: “Tweets about people getting new jobs in tech” in California? We address this by developing a grid-based similarity search algorithm, and we use geo-tagged data from Twitter to evaluate its performance. Our approach consists of three steps. In the first step, we divide the data spatially into a uniform grid, whose resolution and implications are examined. In the second step, we build a graph-based index for the semantic vectors using the FAISS library, based on the Hierarchical Navigable Small Worlds (HNSW) algorithm. Finally, we address the pruning properties of the algorithm for efficient search by avoiding checking the entire dataset. Experimental results show that our algorithm maintains high recall even when spatial relevance is weighted more heavily than textual content, despite the fact that the underlying index is primarily optimized for text. We demonstrate that our method can achieve recall of up to 80–85%, with a 20x improvement in execution time compared to a linear scan of the dataset.

Τίτλος Προγράμματος Μεταπτυχιακών Σπουδών

Πληροφοριακά Συστήματα και Υπηρεσίες

Τμήμα

Σχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Ψηφιακών Συστημάτων

Αριθμός σελίδων

Γλώσσα

Αγγλικά

URI

https://dione.lib.unipi.gr/xmlui/handle/unipi/17962

Συλλογή

Τμήμα Ψηφιακών Συστημάτων

Εμφάνιση πλήρους εγγραφής

Εκτός από όπου διευκρινίζεται διαφορετικά, το τεκμήριο διανέμεται με την ακόλουθη άδεια:
Αναφορά Δημιουργού 3.0 Ελλάδα