Benchmarking open-source embedding & language models for Greek : performance, accuracy, and completeness

Master Thesis
Συγγραφέας
Spyrakou, Spyridoula
Σπυράκου, Σπυριδούλα
Ημερομηνία
2026-02Επιβλέπων
Stamatatos, EfstathiosΣταματάτος, Ευστάθιος
Προβολή/ Άνοιγμα
Λέξεις κλειδιά
Natural language processing ; Transformer encoder models ; NLP ; Domain benchmarking ; LLMs ; Large language models ; Calibration ; Domain-specialised pretrained models ; Multilingual models ; Mononlingual Greek modelsΠερίληψη
This thesis benchmarks open-source Transformer-based encoder models, and in a controlled setting adapter-tuned large language models, for Natural Language Processing in Greek across three registers/domains: formal language (legal documents), informal language (social media and online comments), and conversational language (spoken language understanding). Using a unified fine-tuning and evaluation protocol, we compare multilingual, monolingual Greek, and domain-specialised pretrained models on multi-granular legal topic classification, multi-label legal tagging, legal named entity recognition, offensive language identification, comment acceptance prediction, and intent classification and slot filling. Beyond headline performance, we analyse calibration (Expected Calibration Error, Brier score, negative log-likelihood), learning stability across random seeds, and the effect of alternative optimisation regimes. Results show that optimisation strategies are task-structured rather than universally beneficial: class-weighted loss improves imbalanced document classification but can degrade BIO-style sequence labelling. In the conversational intent setting, we evaluate Low-Rank Adaptation (LoRA) for Greek-compatible large language models and find that strong fully fine-tuned Greek encoders remain highly competitive under the same protocol. Overall, the thesis provides a reusable cross-domain benchmarking framework and empirically grounded guidance on when language/domain matching, calibration, and adaptation strategy matter most for NLP applications in Greek.


