A cascaded BERT model for sentiment classification

Papadakis, Ioannis; Παπαδάκης, Ιωάννης

Ένα πολυσταδιακό μοντέλο BERT για την ανάλυση συναισθήματος

Bachelor Dissertation

Author

Papadakis, Ioannis

Παπαδάκης, Ιωάννης

Date

2025-09

Abstract

This thesis presents the development and evaluation of a BERT-based (Bidirectional Encoder Representations from Transformers) model for multi-class emotion recognition in text. The distinctive contribution of this work is the comprehensive training of BERT from scratch—including both pre-training and fine-tuning phases— rather than relying on publicly available pre-trained models. This approach provides empirical insights into how language models acquire linguistic knowledge and learn task-specific patterns from data. . The approach follows a two-stage process as pre-traning and fine-tuning. Pre- training To pre-train the model, a full BERT-base (12 Transformer layers, and ~110M parameters) is independently trained on English Wikipedia data set(version 20231101) in our experiment over MLM objectives. Subsequently, the model was fine-tuned for emotion classification on a concatenated dataset of about 32000 samples from Twitter Multi-class Sentiment and academic Emotion datasets which had been collected as part of this work; these datasets were generated for following six emotions: Joy, Sadness, Anger, Fear, Love and Surprise. The activity was developed through four phases of experimentation, and each had key lessons. The first stage was the direct supervised training, which fine-tuned a randomly initialized BERT model for sentiment classification. This method catastrophically failed for this dataset reaching only 31% accuracy due to model collapse since the model was always predicting majority class (sadness). This failure constituted solid empirical support for the principle of linguistic pre-task specialization. The second step was proof-of-concept validation with a two-step approach, a smaller model (6-layer for BERT) and the WikiText-2 corpus with CPU environment. This resulted in about 89% accuracy, meaning that the two-phase approach is valid and can be justified to be deployed on a full scale. In the third stage, we aimed to build infrastructure and move on to an NVIDIA L40S 48GB GPU server. This uncovered a number of technical challenges including OS issues (needing to move from an incompatible CentOS Stream 9 To Ubuntu 22.04 LTS), Python compilation problems, and library depedency problems. These were systematically addressed through the creation of a full suite of DevOps automation for deployment, sync, environment setup scripts. The full-scale fine-tuning was conducted in the 4th and last stage, which is training with a complete 12 layers BERT model on entire English Wikipedia dataset including the Optuna-based hyperparameter optimization by Hugging Face Trainer. The pre- training phase lasted about 2.5 hours on the NVIDIA L40S GPU. The tuning phase was conducted using the merged Twitter-Emotion dataset and automatic hyperparameter search over 10 trials were done to find out the most appropriate learning rate and batch size settings. The best model achieved an average accuracy of 91.3% on the test set, and demonstrated promising results for joy (F1-score: 0.94), sadness (F1-score: 0.95) and anger (F1-score: 0.91). The fear (F1-score: 0.88), love (F1-score: 0.83) and surprise (F1-score: 0.72) classes exhibit relatively lower performance because of the smaller amount of training data in these categories, plus significant semantic overlap with other emotion categories. The surprise category, as the most under-represented one in the training data (~3% of samples) was the hardest. This paper shows the importance of two-phase method in training language model from scratch, technical problems participants have solved when seeting up GPU infrastructure and the usefulness of BERT architecture for emotion classification. Also, it also demonstrates the importance of grid search and methodical approach in deep learning projects. The project effectively demonstrates that training a state-of- the-art language model from scratch is possible provided we follow the scientific mindset, have strong enough computational resources, and act high scale DevOps.

Department

Σχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Πληροφορικής

Number of pages

Language

English

URI

https://dione.lib.unipi.gr/xmlui/handle/unipi/18561

Collections

Τμήμα Πληροφορικής

Show full item record

Except where otherwise noted, this item's license is described as
Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα