Benchmarking open-source embedding & language models for Greek : performance, accuracy, and completeness

Spyrakou, Spyridoula; Σπυράκου, Σπυριδούλα

Master Thesis

Συγγραφέας

Spyrakou, Spyridoula

Σπυράκου, Σπυριδούλα

Ημερομηνία

2026-02

Περίληψη

This thesis benchmarks open-source Transformer-based encoder models, and in a controlled setting adapter-tuned large language models, for Natural Language Processing in Greek across three registers/domains: formal language (legal documents), informal language (social media and online comments), and conversational language (spoken language understanding). Using a unified fine-tuning and evaluation protocol, we compare multilingual, monolingual Greek, and domain-specialised pretrained models on multi-granular legal topic classification, multi-label legal tagging, legal named entity recognition, offensive language identification, comment acceptance prediction, and intent classification and slot filling. Beyond headline performance, we analyse calibration (Expected Calibration Error, Brier score, negative log-likelihood), learning stability across random seeds, and the effect of alternative optimisation regimes. Results show that optimisation strategies are task-structured rather than universally beneficial: class-weighted loss improves imbalanced document classification but can degrade BIO-style sequence labelling. In the conversational intent setting, we evaluate Low-Rank Adaptation (LoRA) for Greek-compatible large language models and find that strong fully fine-tuned Greek encoders remain highly competitive under the same protocol. Overall, the thesis provides a reusable cross-domain benchmarking framework and empirically grounded guidance on when language/domain matching, calibration, and adaptation strategy matter most for NLP applications in Greek.

Τίτλος Προγράμματος Μεταπτυχιακών Σπουδών

Τεχνητή Νοημοσύνη - Artificial Intelligence

Τμήμα

Σχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Ψηφιακών Συστημάτων

Συνεργαζόμενο Ίδρυμα

National Center of Scientific Research "Demokritos"

Αριθμός σελίδων

Γλώσσα

Αγγλικά

URI

https://dione.lib.unipi.gr/xmlui/handle/unipi/19041

Συλλογή

Τμήμα Ψηφιακών Συστημάτων

Εμφάνιση πλήρους εγγραφής

Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα

Εκτός από όπου διευκρινίζεται διαφορετικά, το τεκμήριο διανέμεται με την ακόλουθη άδεια:
Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα