Εξαγωγή, ανάλυση και απεικόνιση ποιοτικών πληροφοριών από αδόμητα επιχειρηματικά δεδομένα
View/ Open
Keywords
Μηχανική μάθηση ; Αδόμητα δεδομένα ; Επεξεργασία φυσικής γλώσσαςAbstract
One of the open problems that the academic community is trying to solve and that plagues the commercial and politics sector, is the analysis of unstructured data. From images and video, to sound and text, unstructured data are the majority compared to structured. Despite that fact and even though unstructured data contain a wealth of information, contrary to structured data, the extraction of said information with accuracy and meaningfulness, is still a challenge in the context of impactful decision making.
This thesis focuses on text analysis, based on the emotions that underlie it. The presented methodological approach gives the ability to spot emotional patterns that the author is trying to evoke and then, attempts to classify the documents of the corpus, based on their emotion vectors. This method is effective regardless of the context (news articles, marketing, politics, etc) as well as the length of the document.
For this research we use a lexicon that allows to identify the emotions from words, and also various unsupervised machine learning models. Specifically, LDA (Latent Dirichlet Allocation) is used, which we found out that in many occasions can produce “distilled emotions”, as we shall see. We also use Mahalanobis, One Class SVM and Isolation Forest, in an effort to identify anomalous documents. The application of the above is performed on a complex dataset, that consists mainly of news articles and then, by ISIS’ propaganda texts. The propaganda texts are included because emotions are a key pillar of an effective propaganda. The state-of-the-art research addresses that extensively, as well as other elements of propaganda along the decades. It is worth noting that the findings of the analysis suggest the existence of common practices between western news agencies and terrorist propaganda, with important similarities in the emotions that the authors are trying to evoke, from both sides.