Community detection in signed and directed graphs
Master Thesis
Author
Ψυλλάκος, Νίκος
Psyllakos, Nikos
Date
2021-06-27Advisor
Χαλκίδη, ΜαρίαHalkidi, Maria
View/ Open
Keywords
Υπογεγραμμένοι και κατευθυντικοί γράφοι ; Ανάλυση κοινωνικών δικτύων ; Συσταδοποίηση γράφων ; Ανίχνευση κοινοτήτων ; Μέτρο ομοιότητας ; Affinity propagation ; Spectral clustering ; Python ; Κατευθυντικότητα ; ΠόλωσηAbstract
In recent 2 decades, one of the most popular methods for processing data, is to convert it to a network, where data units are vertices and their relations are the links that connects them.
Such Networks in a clustering problem are usually represented as graphs where each element to be clustered is represented as a node and the distance between two elements is modeled by a certain weight on the edge linking the nodes. Graph clustering aims at partitioning a set of nodes into different groups called clusters or communities that share some form of similarity, similarity score can be formulated by using a distance-based criterion, a topology structural criterion or an “attitude” (polarity) criterion. These clusters will help us to explore information hidden in the data , there is a wide area of applications as e.g. Social Network Analysis, Statistical Data Analysis, Machine Learning, Data mining, Consuming Behavior Analysis, VLSI design, Computer graphics and Gene analysis.
This particular Thesis is putting under the spotlight Signed and Directed graphs and examines two community detection algorithms that cluster nodes with similar characteristics, the goal of the procedure is to emerge the Similarity score as the main and most important factor of Graph Clustering between nodes based on the connectivity pattern they follow including both directionality and polarity of the links which connects them.
The initial step is to work with the negative and positive subgraph of the examined Data Set. For each Subgraph we apply Coupling (Reference) and Cocitation as the Methodologies by which Directionality dominates criteria to reach Similarity and we calculate the appropriate co-citation and co-reference matrices. The Reference or bibliographic coupling Matrix of two nodes in a directed network is the number of nodes to which both these two nodes point on the other hand the Cocitation Matrix of two nodes in a directed network is the number of nodes that point to these both exact nodes. The procedure continues with the similarity Matrix calculation normalized by taking account of the degree of each node and under the influence of Amsler [Amsler, 1972] who pointed out that Co-citation and Bibliography coupling can be combined. And finally we apply two clustering algorithms affinity propagation and spectral clustering, both standalone and combined.
In this Thesis, we provide a concluded implementation of these proposed algorithms in Python,
and the datasets that were used to evaluate in practice their precision and certainty.
We display that these two algorithms have the theoretically expected results for both random
Generated signed directed graphs and real networks data sets.