Αλγοριθμική άντληση δομικής πληροφορίας από ψηφιακά κοινωνικά δίκτυα και υπολογισμός μέτρων κεντρικότητας
Algorithmic extraction of structural information from social networks and computation of associated centrality measures

View/ Open
Keywords
Twitter ; Κοινωνικά δίκτυα ; Γράφος ; Python ; Εξόρυξη γνώσης ; ΑλγόριθμοιAbstract
Our Master thesis has the title: «Algorithmic extraction of structural information from social networks
and computation of associated centrality measures». In simpler words, in the context of this work, we
managed to get data from one social network (specifically from Twitter), and after some processing of
such data, we took a graph. The graph represents a small part of the network. To be precise,
represents the network that is creating, by starting from our Twitter profile, to our Followers of our
Followers. That is up to the 2nd degree relatives in the network. As part of postgraduate thesis, we
were writing code in Python. In this way, we have the ability to get some data from the network. The
representation of the graph was made with the help of Gephi tool. It was given great emphasis in data
mining. We do not take nodes that have them retake, because our goal was to create a graph, where
each node represents a unique user of Twitter. This master thesis, we could say that is divided into 3
parts. First, the extraction of knowledge from a social network, and the introduction of the results in a
file .txt. Second part, the conversion from .txt format file to .gml file, Because Gephi can read only
.gml files. And the third part, the introduction of file .gml in Gephi, and the study of the graph coming
out, especially the centrality measures.
The centrality measures, is how ‘central’ or important is a node in a graph, it is something that
concerns many people, around the world, many times at different industries. For example,
transportation, studying the centrality measures, to make a road of a city, the most central points,
points that are likely to spend the majority of the population is quite high. Airlines are also studying the
centrality measures of air lines, watching the movement of the passengers, but also the strategic
changes of other companies. In this way (by using measures centrality) the airlines understand the
more 'commercial' airports for them, very important information about how the company will act in the
near future. By using the centrality measures, of course and others parameters, often choose to
merge many airlines. Hoping that with few, but hubs, nodes you can cover almost the entire network.
It would be very interesting if we could see the changing of our network, step by step, function of time
, but this is outside the confines of the master thesis, and enters the boundaries of research. In a
subsequent step, if we could see how our network is changing, function of time, we might have the
ability to do and forecasts. And if we could predict the development of the network, we would have
discovered a new knowledge, which could be used in other industries outside of IT. For example in
medicine, provide epidemics. Whereas, epidemics grow up in the similar way as a social network. This
could be, perhaps, following this master thesis at the doctoral level. Because, develop code for a
sizable chunk of the network of Twitter, and the study of the corresponding graph, the observation of
the graph, function of time and the comparison of results particular moments, is a quite complex job,
and too time consuming .
What we wanted to achieve, and we succeeded in this master thesis, is that our idea, even in this
small network, it works. We were able and 'pulled' data from the social network, and based on these
data, we built a graph. From the multidimensional this graph we took important lessons, such as the
criteria by which someone decided to 'follow' someone, in a social network, for example on Twitter
network. Also, we took and statistics results, such as average connection per node, the chances that
we have, one second-degree relative Follow us on Twitter etc.