Στατιστικά, πιθανοτικά και γλωσσικά μοντέλα συνάφειας και τεχνικές μετα-αναζήτησης στην ανάκτηση πληροφοριών

Πέτρου, Κωνσταντίνος

Statistical, probabilistic and linguistic relevance models and meta-search techniques in information retrieval

Master Thesis

Author

Πέτρου, Κωνσταντίνος

Date

2026-02

Abstract

This thesis explores the field of Information Retrieval, focusing on the design, implementation and evaluation of document ranking models and meta-search techniques. Initially, a theoretical basis was established by studying basic concepts such as Boolean Retrieval, Vector Space Models, Probabilistic Models, Linguistic Models and Precision & Recall evaluation methods. In the practical phase, an Information Retrieval system was developed using Apache Lucene 8.0.0. Four ranking models were implemented and tested: BM25 Similarity, TF-IDF Similarity, Dirichlet Similarity LM Model and Jelinek Mercer Similarity LM Model. To evaluate the effectiveness of these models, a custom library in Perl 5 was created to calculate and visualize the Precision and Recall metrics. In addition to the individual retrieval models, two meta-search engines were built using the Python Ranx library, implementing various rank synthesis techniques, such as CombSUM, CombMAX, Reciprocal Rank Fusion (RRF), and ProbFuse. Experimental results are presented, comparing the performance of the individual models of Lucene. The findings highlight the relative advantages of each approach and offer insights into how relevance strategies can improve retrieval efficiency. Also, during the meta-search experiments, the top 10 results derived by the individual models and synthesis techniques are presented. The work focuses on the calculation of these results, laying the foundation for future evaluation and analytical comparison of the performance of different approaches. In this way, useful clues are provided about the behavior of models and meta-search techniques, paving the way for further optimization and development of more efficient information retrieval systems. This work demonstrates the importance of model selection, evaluation and combination in building efficient and effective information retrieval systems.

Postgraduate Studies Programme

Πληροφορική

Department

Σχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Πληροφορικής

Number of pages

Language

Greek

URI

https://dione.lib.unipi.gr/xmlui/handle/unipi/18970

Collections

Τμήμα Πληροφορικής

Show full item record

Except where otherwise noted, this item's license is described as
Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα