Συλλογή δεδομένων και εξόρυξη γνώσης από κοινωνικά δίκτυα : εφαρμογή data analysis τεχνικών σε σύνολα δεδομένων από το κοινωνικό δίκτυο Twitter
Data mining and knowledge discovery from social media : implementation data analysis methods on data collection from Twitter
KeywordsK-means ; Ward ; Ανάλυση δεδομένων ; Εξόρυξη γνώσης ; Data mining ; Twitter ; Συσταδοποίηση ; Ιεραρχική ανάλυση ; Κοινωνικά δίκτυα ; Social media ; Data analysis ; Clustering ; Agglomerative clustering ; Non-negative Matrix Factorization (NMF)
This thesis was carried out as part of the undergraduate degree program Digital Systems, University of Piraeus, a curriculum mainly oriented in the sectors of Network-Oriented and Telecommunication Systems and Services aiming to develop future scientists capable of contributing to the development, implementation and management of modern digital systems. To this end, subject of the thesis is related to the newly developed domain of Big Data, their management and knowledge extraction from the web and especially social networks. We live in an age where people devote an important amount of their time on social networks, where they consume and produce unimaginable for earlier times, information sizes. The management of all this information has multifaceted benefits. With proper treatment of the data, we can extract valuable knowledge and conclusions almost for most aspects of human activity, as the disclosed information comes from a huge and diverse population of individuals in an environment that is similar enough to the real society. The solution to the problem of knowledge extraction from data comes from the IT industry and more specifically with the technologies of “data mining” and “data analysis”. In this document we will first present how we can export data from the social network Twitter, followed by processing them in order to able to “feed” machine learning algorithms and cluster the data according to their content. In the end we will deal with “topic detection”, i.e. a number of tools provided in order to discover hidden themes and concepts from out data collections.