Μελέτη και ανάπτυξη προσεγγίσεων αξιολόγησης αποτελεσμάτων clustering σε γράφους
View/ Open
Keywords
Graph clustering ; Ground truth ; Cluster validity indicesAbstract
Graphs is the way to represent networks in mathematics, networks that their structural elements
are interconnected. Graph are consisted from nodes that represents the data and edges that
represents the relationships between nodes. The structure of a graph gives information about the
nodes, the nodes that belongs to the same cluster have more similar characteristics than with
nodes that belongs to other clusters.
In this thesis will be studied the results of clustering algorithms in graph data sets by
evaluating them with evaluating indices that have proposed in relevant papers. More specifically
three graph data sets will be clustered by using two popular clustering algorithms, Spectral and
Louvain. Each cluster algorithm will give results for several number of clusters, in each result will
be evaluated the quality of clusters with the use of two popular indices, modularity and
conductance. The modularity index compares the intra-linkage of a cluster that need to be denser
from the inter-linkage of the cluster with the neighbor clusters. The index conductance compares
the number of edges cut and their weights that induced from a cut in graph that split the graph
to subgraphs. Also there will be used three evaluation indices that have developed and proposed
from papers with common field of interest. The index Q-graph that is also calculated uses the
degeneracy and graph density that referred to density of nodes and edges, to evaluate the
connectivity of nodes in and between clusters. The index CDS that is also used for evaluation of
clustering results, contains calculation regarding the structural cohesion and the graph density,
cohesion evaluates the distance between the nodes and the possibility of separation. One more
index that is calculated is GS* that is based to silhouette index, this index is evaluating the
similarity of nodes comparing the distance between nodes belonging in the same cluster and their
distance with nodes of neighboring clusters. Python environment includes tools and libraries that
will be useful for the experimental study.
To interpret the results, a small but well-known graph data set will be used to visualize
the clusters and to observe how the cluster structure affects the results of the cluster validation
indices. Cluster validation indices have significant role to ensure the reliability of the results, the
current experiment study comes to the conclusion that there is no the best index but a
combination that need to be used according to graph structure.