Sentiment analysis for financial news

Master Thesis
Συγγραφέας
Savvas, Spyros
Σάββας, Σπύρος
Ημερομηνία
2025Επιβλέπων
Filippakis, MichaelΦιλιππάκης, Μιχαήλ
Προβολή/ Άνοιγμα
Λέξεις κλειδιά
Sentiment analysis ; Transformer models ; Natural Language Processing (NLP) ; Large Language Models (LLMs) ; BERT ; Parameter-Efficient Fine-Tuning (PEFT)Περίληψη
This thesis explores the development and application of sentiment analysis techniques specifically tailored for financial news headlines, utilizing a dataset of 4,838 unique instances. The primary objective is to systematically evaluate and compare the performance of diverse methodologies, ranging from traditional lexicon-based approaches to state-of-the-art transformer architectures, for accurate sentiment classification (positive, negative, neutral) within the financial domain. Methodologically, the study implements and contrasts three distinct categories of models: (1) Lexicon-based classifiers utilizing the general-purpose VADER and the domain-specific Loughran-McDonald financial sentiment dictionary; (2) Fine-tuned Bidirectional Encoder Representations from Transformers (BERT) models, exploring variations in sentence representation through CLS, Mean, and Max pooling strategies, evaluated on a common imbalanced test set; and (3) The Gemma 7B-IT large language model, fine-tuned as a sequence classifier using parameter-efficient techniques (4-bit quantization and Low-Rank Adaptation - LoRA), also evaluated on the same common imbalanced test set. The models' performance is rigorously assessed using standard classification metrics, with a particular focus on the Macro F1-score to address observed class imbalance in the dataset. Experimental results unequivocally demonstrate the significant performance advantage of transformer-based models (BERT and Gemma) over lexicon-based approaches, which struggle with nuance and domain specificity. Fine-tuned BERT models achieve strong results (e.g., the CLS pooling strategy with non-stratified training data reached up to 88.2\% accuracy and a 0.87 Macro F1-score). The Gemma 7B-IT model, fine-tuned as a sequence classifier with stratified training data, also demonstrated comparable top-tier performance, achieving 87.2\% accuracy and a 0.87 Macro F1-score. This research contributes a valuable comparative benchmark for financial sentiment analysis, underscoring the effectiveness of modern, efficiently fine-tuned large language models and established transformer architectures for tackling domain-specific natural language processing tasks.