Out-of-distribution detection of machine generated text

Kampouridis, Prodromos; Καμπουρίδης, Πρόδρομος

dc.contributor.advisor	Stamatatos, Efstathios
dc.contributor.advisor	Σταματάτος, Ευστάθιος
dc.contributor.author	Kampouridis, Prodromos
dc.contributor.author	Καμπουρίδης, Πρόδρομος
dc.date.accessioned	2026-03-23T12:22:54Z
dc.date.available	2026-03-23T12:22:54Z
dc.date.issued	2026-02
dc.identifier.uri	https://dione.lib.unipi.gr/xmlui/handle/unipi/19043
dc.format.extent	65	el
dc.language.iso	en	el
dc.publisher	Πανεπιστήμιο Πειραιώς	el
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/gr/	*
dc.title	Out-of-distribution detection of machine generated text	el
dc.type	Master Thesis	el
dc.contributor.department	Σχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Ψηφιακών Συστημάτων	el
dc.description.abstractEN	Detecting machine generated text is increasingly important as Large Language Models (LLMs) evolve rapidly. In practice, detectors often fail to generalize Out-of-Distribution (OOD), degrading under domain shifts, topic changes, unseen generators, and paraphrasing attacks. This thesis studies whether compact machine generated text detectors can retain stronger OOD robustness through teacher-student training. The student is optimized with supervised cross-entropy, optional logit-based knowledge distillation via temperature-scaled KL divergence, and teacher-guided representation alignment using triplet loss and supervised contrastive learning. Experiments on the MAGE benchmark follow its OOD testbed protocol across unseen domain, unseen model, combined shift, and paraphrasing settings. To support deployment-oriented evaluation, performance is reported both before and after decision-threshold calibration. The results show that triplet-based teacher guidance is the strongest distillation strategy among the distilled variants, with the best final model combining cross-entropy, knowledge distillation, and teacher-guided triplet alignment. Overall, the proposed distilled detector is competitive with the MAGE Longformer baseline on standard OOD settings and achieves substantially lower inference latency, yielding a lightweight and practically efficient detector for robust machine generated text detection beyond the training distribution.	el
dc.corporate.name	National Center of Scientific Research "Demokritos"	el
dc.contributor.master	Τεχνητή Νοημοσύνη - Artificial Intelligence	el
dc.subject.keyword	Out-of-distribution detection	el
dc.subject.keyword	Machine generated text detection	el
dc.subject.keyword	Knowledge distillation	el
dc.subject.keyword	Triplet loss	el
dc.date.defense	2026-02

Αρχεία σε αυτό το τεκμήριο

Name:: Kampouridis_2203.pdf
Μέγεθος:: 890.9Kb
Τύπος:: PDF
Description:: Master thesis

Προβολή/Άνοιγμα

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Τμήμα Ψηφιακών Συστημάτων
Department of Digital Systems

Εμφάνιση απλής εγγραφής

Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα

Εκτός από όπου διευκρινίζεται διαφορετικά, το τεκμήριο διανέμεται με την ακόλουθη άδεια:
Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα