Συγκριτική μελέτη μεθόδων εξόρυξης συναισθήματος σε κριτικές ταινιών
A comparative study of sentiment analysis techniques on movie reviews domain
Παναγιάρης, Νικόλαος Γρ.
Θεματική επικεφαλίδαNeural networks (Computer science)
Λέξεις κλειδιάSupport Vector Machines (SVM) ; Sentiment analysis ; Machine learning ; Sentiment classification
Sentiment analysis has emerged as a eld that has attracted a signi cant amount of attention since it has a wide variety of applications that could bene t from its results, such as news analytics, marketing, question answering, knowledge management and so on. This area, however, is still early in its development where urgent improvements are required on many issues, particularly on the performance of sentiment classi cation. Document-level sentiment classi cation aims to automate the task of classifying a textual review, which is given on a single topic, as expressing a positive or negative sentiment. In general, supervised methods consist of two stages: (i) extraction/selection of informative features and (ii) classi cation of reviews by using learning models like Support Vector Machines (SVM) and Naive Bayes (NB). SVM have been extensively and successfully used as a sentiment learning approach while Deep learning neural networks have been applied only recently , and were not included in comparative studies in the sentiment analysis literature. In this thesis, we survey and implement several deep learning and deep-learning-inspired approaches and we present an empirical comparison between convenient machine learning techniques and Deep learning methods regarding document-level sentiment analysis. We discuss requirements, resulting models and contexts in which both approaches achieve better levels of classi cation accuracy. Our experiments indicated that SVM outperform the sophisticated DL methods on the benchmark dataset of Movies reviews. Our results have also con rmed some potential limitations of both models, which have been rarely discussed in the sentiment classi cation literature, like the computational cost of SVM at the running time and DL at the training time.