Μελέτη και ανάπτυξη προσεγγίσεων αξιολόγησης αποτελεσμάτων clustering σε γράφους

Γαγλία, Ιωάννα

Master Thesis

Author

Γαγλία, Ιωάννα

Date

2022-06

Abstract

Graphs is the way to represent networks in mathematics, networks that their structural elements are interconnected. Graph are consisted from nodes that represents the data and edges that represents the relationships between nodes. The structure of a graph gives information about the nodes, the nodes that belongs to the same cluster have more similar characteristics than with nodes that belongs to other clusters. In this thesis will be studied the results of clustering algorithms in graph data sets by evaluating them with evaluating indices that have proposed in relevant papers. More specifically three graph data sets will be clustered by using two popular clustering algorithms, Spectral and Louvain. Each cluster algorithm will give results for several number of clusters, in each result will be evaluated the quality of clusters with the use of two popular indices, modularity and conductance. The modularity index compares the intra-linkage of a cluster that need to be denser from the inter-linkage of the cluster with the neighbor clusters. The index conductance compares the number of edges cut and their weights that induced from a cut in graph that split the graph to subgraphs. Also there will be used three evaluation indices that have developed and proposed from papers with common field of interest. The index Q-graph that is also calculated uses the degeneracy and graph density that referred to density of nodes and edges, to evaluate the connectivity of nodes in and between clusters. The index CDS that is also used for evaluation of clustering results, contains calculation regarding the structural cohesion and the graph density, cohesion evaluates the distance between the nodes and the possibility of separation. One more index that is calculated is GS* that is based to silhouette index, this index is evaluating the similarity of nodes comparing the distance between nodes belonging in the same cluster and their distance with nodes of neighboring clusters. Python environment includes tools and libraries that will be useful for the experimental study. To interpret the results, a small but well-known graph data set will be used to visualize the clusters and to observe how the cluster structure affects the results of the cluster validation indices. Cluster validation indices have significant role to ensure the reliability of the results, the current experiment study comes to the conclusion that there is no the best index but a combination that need to be used according to graph structure.

Postgraduate Studies Programme

Πληροφοριακά Συστήματα και Υπηρεσίες

Department

Σχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Ψηφιακών Συστημάτων

Number of pages

Language

Greek

URI

https://dione.lib.unipi.gr/xmlui/handle/unipi/14419
http://dx.doi.org/10.26267/unipi_dione/1842

Collections

Τμήμα Ψηφιακών Συστημάτων

Show full item record