Fake news detection in a headline-only setting : a comparative study of machine learning and deep learning models on the GossipCop dataset

Sotiropoulos, Dionysios; Σωτηρόπουλος, Διονύσιος

dc.contributor.advisor	Filippakis, Michael
dc.contributor.advisor	Φιλιππάκης, Μιχαήλ
dc.contributor.author	Sotiropoulos, Dionysios
dc.contributor.author	Σωτηρόπουλος, Διονύσιος
dc.date.accessioned	2026-05-27T06:06:13Z
dc.date.available	2026-05-27T06:06:13Z
dc.date.issued	2026-04
dc.identifier.uri	https://dione.lib.unipi.gr/xmlui/handle/unipi/19385
dc.format.extent	115	el
dc.language.iso	en	el
dc.publisher	Πανεπιστήμιο Πειραιώς	el
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/gr/	*
dc.title	Fake news detection in a headline-only setting : a comparative study of machine learning and deep learning models on the GossipCop dataset	el
dc.title.alternative	Ανίχνευση ψευδών ειδήσεων από τίτλους : συγκριτική μελέτη μοντέλων μηχανικής και βαθιάς μάθησης στο σύνολο δεδομένων GossipCop	el
dc.type	Master Thesis	el
dc.contributor.department	Σχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Ψηφιακών Συστημάτων	el
dc.description.abstractEN	The rapid spread of misinformation through digital platforms has made fake news detection an important problem in contemporary data-driven research. This thesis investigates the effectiveness of Machine Learning and Deep Learning approaches for fake news detection in a constrained headline-only setting, where the available textual information is limited and the dataset is significantly imbalanced. The study is based on the GossipCop subset of FakeNewsNet and focuses exclusively on headline text in order to evaluate model behavior under controlled content-based conditions. A comparative experimental framework was developed including four Machine Learning models, namely Logistic Regression, Linear Support Vector Classification, Random Forest, and XGBoost, as well as four Deep Learning approaches: Convolutional Neural Networks, Bidirectional Long Short-Term Memory networks, DistilBERT, and a hybrid DistilBERT + XGBoost model. For the Machine Learning models, headlines were represented using 300-dimensional Doc2Vec embeddings and evaluated with stratified 10-fold cross-validation, while class imbalance was handled through SMOTE applied only to the training folds. For the Deep Learning models, tokenized or transformer-based headline representations were used within a holdout evaluation framework, with class weighting employed where appropriate. The results show a clear performance gap between the two model families. The Machine Learning baselines, especially Logistic Regression and LinearSVC, exhibited weak discriminative performance, while Random Forest and XGBoost improved overall accuracy but remained ineffective in recovering fake news instances. In contrast, the Deep Learning models achieved substantially stronger and more balanced results. DistilBERT provided the best overall balance across ROC-AUC, Cohen’s Kappa, fake-class F1-score, and weighted F1-score. The hybrid DistilBERT + XGBoost model achieved the highest overall accuracy, although at the cost of lower fake recall, whereas BiLSTM demonstrated the strongest ability to recover fake news instances. The findings indicate that transformer-based and other Deep Learning approaches are more suitable than traditional Machine Learning methods for headline-based fake news detection. At the same time, the study highlights the strong influence of class imbalance and the inherent limitations of headline-only datasets, which restrict contextual depth and constrain model performance. Overall, the thesis shows that fake news detection based solely on headlines remains a relevant but inherently difficult classification problem.	el
dc.contributor.master	Πληροφοριακά Συστήματα και Υπηρεσίες	el
dc.subject.keyword	Fake news detection	el
dc.subject.keyword	Headline-based classification	el
dc.subject.keyword	Machine learning	el
dc.subject.keyword	Deep learning	el
dc.subject.keyword	Natural language processing	el
dc.subject.keyword	NLP	el
dc.subject.keyword	Transformer models	el
dc.subject.keyword	Hybrid models	el
dc.subject.keyword	Logistic regression	el
dc.subject.keyword	Linear SVM	el
dc.subject.keyword	Random forest	el
dc.subject.keyword	XGBoost	el
dc.subject.keyword	CNN	el
dc.subject.keyword	BiLSTM	el
dc.subject.keyword	DistilBERT	el
dc.subject.keyword	Supervised binary classification	el
dc.subject.keyword	GossipCop dataset	el
dc.subject.keyword	FakeNewsNet	el
dc.date.defense	2026-05-21

Αρχεία σε αυτό το τεκμήριο

Name:: Sotiropoulos_ME2477.pdf
Μέγεθος:: 3.285Mb
Τύπος:: PDF
Description:: Master thesis

Προβολή/Άνοιγμα

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Τμήμα Ψηφιακών Συστημάτων
Department of Digital Systems

Εμφάνιση απλής εγγραφής

Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα

Εκτός από όπου διευκρινίζεται διαφορετικά, το τεκμήριο διανέμεται με την ακόλουθη άδεια:
Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα