Πρόβλεψη βαθμολογίας ταινιών από κείμενα κριτικών χρησιμοποιώντας το μοντέλο BERT
Predicting movie ratings from review text using a BERT model

View/ Open
Keywords
BERT modelAbstract
With applications in corporate intelligence, social media monitoring, political analysis, and healthcare, sentiment analysis is a major field of study in natural language processing (NLP). This study investigates the effectiveness of sentiment classification based on deep learning using the BERT model, focusing on movie review datasets. While traditional sentiment analysis techniques such as Naïve Bayes and Support Vector Machines (SVMs) have shown respectable performance, they struggle with complex linguistic structures and context-dependent shifts in sentiment. Text classification accuracy has significantly improved with recent transformer-based architectures, especially BERT, which uses self-attention mechanisms and captures bidirectional context.
This thesis presents a method for sentiment classification, beginning with data collection and preparation. The main dataset used is the IMDB dataset, a popular benchmark for sentiment analysis. Preprocessing techniques like tokenization, stop-word removal, and lemmatization are applied to enhance textual input for deep learning models. The study presents and compares various machine learning and deep learning approaches to demonstrate the superiority of transformer-based models over conventional ones.
According to experimental results, BERT outperforms recurrent neural networks and traditional machine learning techniques in sentiment classification, achieving excellent accuracy and robustness. Its improved precision, recall, and F1-score stem from its ability to capture complex sentiment expressions. However, despite its advantages, BERT has notable limitations, including issues of interpretability, computational complexity, and reliance on large datasets.
The study's findings highlight the potential of transformer-based architectures for real-world sentiment analysis applications. Future research should explore strategies to overcome current limitations, such as integrating multimodal sentiment analysis (combining voice and facial expression data), improving interpretability via attention visualization methods, and increasing inference speed through model distillation. Solving these issues could make sentiment classification models more efficient, adaptable, and understandable for a wide range of uses.