Εμφάνιση απλής εγγραφής

Fake news detection in a headline-only setting : a comparative study of machine learning and deep learning models on the GossipCop dataset

dc.contributor.advisorFilippakis, Michael
dc.contributor.advisorΦιλιππάκης, Μιχαήλ
dc.contributor.authorSotiropoulos, Dionysios
dc.contributor.authorΣωτηρόπουλος, Διονύσιος
dc.date.accessioned2026-05-27T06:06:13Z
dc.date.available2026-05-27T06:06:13Z
dc.date.issued2026-04
dc.identifier.urihttps://dione.lib.unipi.gr/xmlui/handle/unipi/19385
dc.format.extent115el
dc.language.isoenel
dc.publisherΠανεπιστήμιο Πειραιώςel
dc.rightsΑναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/gr/*
dc.titleFake news detection in a headline-only setting : a comparative study of machine learning and deep learning models on the GossipCop datasetel
dc.title.alternativeΑνίχνευση ψευδών ειδήσεων από τίτλους : συγκριτική μελέτη μοντέλων μηχανικής και βαθιάς μάθησης στο σύνολο δεδομένων GossipCopel
dc.typeMaster Thesisel
dc.contributor.departmentΣχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Ψηφιακών Συστημάτωνel
dc.description.abstractENThe rapid spread of misinformation through digital platforms has made fake news detection an important problem in contemporary data-driven research. This thesis investigates the effectiveness of Machine Learning and Deep Learning approaches for fake news detection in a constrained headline-only setting, where the available textual information is limited and the dataset is significantly imbalanced. The study is based on the GossipCop subset of FakeNewsNet and focuses exclusively on headline text in order to evaluate model behavior under controlled content-based conditions. A comparative experimental framework was developed including four Machine Learning models, namely Logistic Regression, Linear Support Vector Classification, Random Forest, and XGBoost, as well as four Deep Learning approaches: Convolutional Neural Networks, Bidirectional Long Short-Term Memory networks, DistilBERT, and a hybrid DistilBERT + XGBoost model. For the Machine Learning models, headlines were represented using 300-dimensional Doc2Vec embeddings and evaluated with stratified 10-fold cross-validation, while class imbalance was handled through SMOTE applied only to the training folds. For the Deep Learning models, tokenized or transformer-based headline representations were used within a holdout evaluation framework, with class weighting employed where appropriate. The results show a clear performance gap between the two model families. The Machine Learning baselines, especially Logistic Regression and LinearSVC, exhibited weak discriminative performance, while Random Forest and XGBoost improved overall accuracy but remained ineffective in recovering fake news instances. In contrast, the Deep Learning models achieved substantially stronger and more balanced results. DistilBERT provided the best overall balance across ROC-AUC, Cohen’s Kappa, fake-class F1-score, and weighted F1-score. The hybrid DistilBERT + XGBoost model achieved the highest overall accuracy, although at the cost of lower fake recall, whereas BiLSTM demonstrated the strongest ability to recover fake news instances. The findings indicate that transformer-based and other Deep Learning approaches are more suitable than traditional Machine Learning methods for headline-based fake news detection. At the same time, the study highlights the strong influence of class imbalance and the inherent limitations of headline-only datasets, which restrict contextual depth and constrain model performance. Overall, the thesis shows that fake news detection based solely on headlines remains a relevant but inherently difficult classification problem.el
dc.contributor.masterΠληροφοριακά Συστήματα και Υπηρεσίεςel
dc.subject.keywordFake news detectionel
dc.subject.keywordHeadline-based classificationel
dc.subject.keywordMachine learningel
dc.subject.keywordDeep learningel
dc.subject.keywordNatural language processingel
dc.subject.keywordNLPel
dc.subject.keywordTransformer modelsel
dc.subject.keywordHybrid modelsel
dc.subject.keywordLogistic regressionel
dc.subject.keywordLinear SVMel
dc.subject.keywordRandom forestel
dc.subject.keywordXGBoostel
dc.subject.keywordCNNel
dc.subject.keywordBiLSTMel
dc.subject.keywordDistilBERTel
dc.subject.keywordSupervised binary classificationel
dc.subject.keywordGossipCop datasetel
dc.subject.keywordFakeNewsNetel
dc.date.defense2026-05-21


Αρχεία σε αυτό το τεκμήριο

Thumbnail

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Εμφάνιση απλής εγγραφής

Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα
Εκτός από όπου διευκρινίζεται διαφορετικά, το τεκμήριο διανέμεται με την ακόλουθη άδεια:
Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα

Βιβλιοθήκη Πανεπιστημίου Πειραιώς
Επικοινωνήστε μαζί μας
Στείλτε μας τα σχόλιά σας
Created by ELiDOC
Η δημιουργία κι ο εμπλουτισμός του Ιδρυματικού Αποθετηρίου "Διώνη", έγιναν στο πλαίσιο του Έργου «Υπηρεσία Ιδρυματικού Αποθετηρίου και Ψηφιακής Βιβλιοθήκης» της πράξης «Ψηφιακές υπηρεσίες ανοιχτής πρόσβασης της βιβλιοθήκης του Πανεπιστημίου Πειραιώς»