Στατιστικά, πιθανοτικά και γλωσσικά μοντέλα συνάφειας και τεχνικές μετα-αναζήτησης στην ανάκτηση πληροφοριών
Statistical, probabilistic and linguistic relevance models and meta-search techniques in information retrieval

View/ Open
Keywords
Information retrieval ; Meta search ; Lucene ; Ανάκτηση πληροφορίαςAbstract
This thesis explores the field of Information Retrieval, focusing on the design, implementation
and evaluation of document ranking models and meta-search techniques.
Initially, a theoretical basis was established by studying basic concepts such as Boolean
Retrieval, Vector Space Models, Probabilistic Models, Linguistic Models and Precision &
Recall evaluation methods.
In the practical phase, an Information Retrieval system was developed using Apache
Lucene 8.0.0. Four ranking models were implemented and tested: BM25 Similarity, TF-IDF
Similarity, Dirichlet Similarity LM Model and Jelinek Mercer Similarity LM Model. To evaluate
the effectiveness of these models, a custom library in Perl 5 was created to calculate and
visualize the Precision and Recall metrics.
In addition to the individual retrieval models, two meta-search engines were built using the
Python Ranx library, implementing various rank synthesis techniques, such as CombSUM,
CombMAX, Reciprocal Rank Fusion (RRF), and ProbFuse.
Experimental results are presented, comparing the performance of the individual models
of Lucene. The findings highlight the relative advantages of each approach and offer insights
into how relevance strategies can improve retrieval efficiency. Also, during the meta-search
experiments, the top 10 results derived by the individual models and synthesis techniques are
presented. The work focuses on the calculation of these results, laying the foundation for
future evaluation and analytical comparison of the performance of different approaches. In
this way, useful clues are provided about the behavior of models and meta-search
techniques, paving the way for further optimization and development of more efficient
information retrieval systems.
This work demonstrates the importance of model selection, evaluation and combination in
building efficient and effective information retrieval systems.


