Αλγοριθμική άντληση δομικής πληροφορίας από ψηφιακά κοινωνικά δίκτυα και υπολογισμός μέτρων κεντρικότητας
Algorithmic extraction of structural information from social networks and computation of associated centrality measures
Our Master thesis has the title: «Algorithmic extraction of structural information from social networks and computation of associated centrality measures». In simpler words, in the context of this work, we managed to get data from one social network (specifically from Twitter), and after some processing of such data, we took a graph. The graph represents a small part of the network. To be precise, represents the network that is creating, by starting from our Twitter profile, to our Followers of our Followers. That is up to the 2nd degree relatives in the network. As part of postgraduate thesis, we were writing code in Python. In this way, we have the ability to get some data from the network. The representation of the graph was made with the help of Gephi tool. It was given great emphasis in data mining. We do not take nodes that have them retake, because our goal was to create a graph, where each node represents a unique user of Twitter. This master thesis, we could say that is divided into 3 parts. First, the extraction of knowledge from a social network, and the introduction of the results in a file .txt. Second part, the conversion from .txt format file to .gml file, Because Gephi can read only .gml files. And the third part, the introduction of file .gml in Gephi, and the study of the graph coming out, especially the centrality measures. The centrality measures, is how ‘central’ or important is a node in a graph, it is something that concerns many people, around the world, many times at different industries. For example, transportation, studying the centrality measures, to make a road of a city, the most central points, points that are likely to spend the majority of the population is quite high. Airlines are also studying the centrality measures of air lines, watching the movement of the passengers, but also the strategic changes of other companies. In this way (by using measures centrality) the airlines understand the more 'commercial' airports for them, very important information about how the company will act in the near future. By using the centrality measures, of course and others parameters, often choose to merge many airlines. Hoping that with few, but hubs, nodes you can cover almost the entire network. It would be very interesting if we could see the changing of our network, step by step, function of time , but this is outside the confines of the master thesis, and enters the boundaries of research. In a subsequent step, if we could see how our network is changing, function of time, we might have the ability to do and forecasts. And if we could predict the development of the network, we would have discovered a new knowledge, which could be used in other industries outside of IT. For example in medicine, provide epidemics. Whereas, epidemics grow up in the similar way as a social network. This could be, perhaps, following this master thesis at the doctoral level. Because, develop code for a sizable chunk of the network of Twitter, and the study of the corresponding graph, the observation of the graph, function of time and the comparison of results particular moments, is a quite complex job, and too time consuming . What we wanted to achieve, and we succeeded in this master thesis, is that our idea, even in this small network, it works. We were able and 'pulled' data from the social network, and based on these data, we built a graph. From the multidimensional this graph we took important lessons, such as the criteria by which someone decided to 'follow' someone, in a social network, for example on Twitter network. Also, we took and statistics results, such as average connection per node, the chances that we have, one second-degree relative Follow us on Twitter etc.