Εμφάνιση απλής εγγραφής

dc.contributor.advisorΔουλκερίδης, Χρήστος
dc.contributor.advisorDoulkeridis, Christos
dc.contributor.authorΣασάτη, Ηλίας
dc.contributor.authorSasati, Ilias
dc.date.accessioned2025-07-16T09:54:15Z
dc.date.available2025-07-16T09:54:15Z
dc.date.issued2025-06
dc.identifier.urihttps://dione.lib.unipi.gr/xmlui/handle/unipi/17962
dc.format.extent76el
dc.language.isoenel
dc.publisherΠανεπιστήμιο Πειραιώςel
dc.rightsΑναφορά Δημιουργού 3.0 Ελλάδα*
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/gr/*
dc.titleGrid based hybrid search for spatio-textual datael
dc.typeMaster Thesisel
dc.contributor.departmentΣχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Ψηφιακών Συστημάτωνel
dc.description.abstractENIn this thesis, we present a new approach to the approximate similarity search problem over spatio-textual data, where queries involve both geographic locations and semantically rich text. Unlike traditional approaches that rely on exact keyword matching, our method leverages semantic vector representation (word embeddings) that captures the underlying meaning of textual content. We have developed an algorithm that efficiently processes spatio-textual queries in high dimensional vector space, combining spatial proximity with semantic similarity. Building a combined index is challenging, especially when one needs to prioritize either textual or spatial relevance. Traditionally, such combined indexes rely on keyword search, which limits contextual understanding. But what if we wanted to answer semantic queries such as: “Tweets about people getting new jobs in tech” in California? We address this by developing a grid-based similarity search algorithm, and we use geo-tagged data from Twitter to evaluate its performance. Our approach consists of three steps. In the first step, we divide the data spatially into a uniform grid, whose resolution and implications are examined. In the second step, we build a graph-based index for the semantic vectors using the FAISS library, based on the Hierarchical Navigable Small Worlds (HNSW) algorithm. Finally, we address the pruning properties of the algorithm for efficient search by avoiding checking the entire dataset. Experimental results show that our algorithm maintains high recall even when spatial relevance is weighted more heavily than textual content, despite the fact that the underlying index is primarily optimized for text. We demonstrate that our method can achieve recall of up to 80–85%, with a 20x improvement in execution time compared to a linear scan of the dataset.el
dc.contributor.masterΠληροφοριακά Συστήματα και Υπηρεσίεςel
dc.subject.keywordKNNel
dc.subject.keywordFAISSel
dc.subject.keywordVector searchel
dc.subject.keywordANNel
dc.subject.keywordHybrid searchel
dc.subject.keywordSpatio-textual queriesel
dc.subject.keywordApproximate Nearest Neighborsel
dc.subject.keywordHierarchical Navigable Small Worldsel
dc.subject.keywordFAISS Libraryel
dc.subject.keywordSimilarity searchel
dc.subject.keywordk-Nearest Neighborsel
dc.date.defense2025-07-07


Αρχεία σε αυτό το τεκμήριο

Thumbnail

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Εμφάνιση απλής εγγραφής

Αναφορά Δημιουργού 3.0 Ελλάδα
Εκτός από όπου διευκρινίζεται διαφορετικά, το τεκμήριο διανέμεται με την ακόλουθη άδεια:
Αναφορά Δημιουργού 3.0 Ελλάδα

Βιβλιοθήκη Πανεπιστημίου Πειραιώς
Επικοινωνήστε μαζί μας
Στείλτε μας τα σχόλιά σας
Created by ELiDOC
Η δημιουργία κι ο εμπλουτισμός του Ιδρυματικού Αποθετηρίου "Διώνη", έγιναν στο πλαίσιο του Έργου «Υπηρεσία Ιδρυματικού Αποθετηρίου και Ψηφιακής Βιβλιοθήκης» της πράξης «Ψηφιακές υπηρεσίες ανοιχτής πρόσβασης της βιβλιοθήκης του Πανεπιστημίου Πειραιώς»