Community detection in signed, directected graphs

Ανδρουλιδάκης, Εμμανουήλ; Androulidakis, Manolis

Master Thesis

Author

Ανδρουλιδάκης, Εμμανουήλ

Androulidakis, Manolis

Date

2021-02

Abstract

Graph clustering is a fundamental technique that partitions similar nodes into clusters. It is of great importance in large-scale networks and is widely adopted in a variety of scientific areas where it is crucial to identify patterns or structures quickly, such as Social Network Analysis, Statistical Data Analysis, Data Mining, Machine Learning, Biology and others. In this thesis, we focus on signed directed graphs and examine a community detection algorithm that groups together nodes with similar characteristics. The single most significant factor for efficient graph clustering is the computation of the similarity score among nodes. Our main challenge is to establish a similarity relationship among nodes based on the connectivity patterns they follow, i.e., combine the concepts positive and negative co-citation and co-reference. First, we partition the original graph into its positive and negative subgraphs and work in parallel on both subgraphs. For each subgraph, we apply citation analysis and binding theory to calculate the co-citation and co-reference matrices, respectively. The former contains the number of nodes that two nodes both point to, while the latter gives the number of nodes that commonly point to two nodes. We use normalized mathematical models to smoothen data, factoring in the degree of each node. Further, to remove outliers, we tested taking into consideration the balance among positive and negative incoming and outgoing links among nodes. That is, the number of common links between two nodes counts less when these nodes share significantly less positive than negative out-links, and vice versa. In this work, we also provide a complete implementation of the proposed algorithm in Python, as well as the datasets that we used to experimentally evaluate its correctness and accuracy. With this, we prove that the proposed algorithm returns the expected results for randomly generated signed directed graphs and scales up to thousands of nodes, depending on the density of the original graph.

Postgraduate Studies Programme

Ψηφιακά Συστήματα και Υπηρεσίες

Department

Σχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Ψηφιακών Συστημάτων

Number of pages

Language

English

URI

https://dione.lib.unipi.gr/xmlui/handle/unipi/13345
http://dx.doi.org/10.26267/unipi_dione/768

Collections

Τμήμα Ψηφιακών Συστημάτων

Show full item record

Except where otherwise noted, this item's license is described as
Αναφορά Δημιουργού 3.0 Ελλάδα