Ζητήματα ομοιότητας στην εξόρυξη γνώσης: μεθοδολογίες και τεχνικές
Similarity issues in data mining: methodologies and techniques
Ντούτση, Ειρήνη Χριστόφορος
SubjectΔίκτυα υπολογιστών ; Data mining ; Computer networks ; Εξόρυξη δεδομένων ; Online social networks ; Κοινωνικά δίκτυα
The amount of patterns extracted nowadays from Knowledge Discovery and Data Mining (KDD) is rapidly growing, thus imposing new challenges regarding their management. One of the most important operations on the extracted pattern sets is that of dissimilarity assessment which raises a lot of fruitful research issues and results in a variety of important applications. This dissertation studies several issues that arise during the pattern dissimilarity assessment process. At first, we propose a generic framework for the comparison of arbitrary complex patterns defined over raw data and over other patterns. Next, we study specific dissimilarity problems for the most popular pattern types, namely frequent itemsets, decision trees and clusters. More specifically, for the case of frequent itemset patterns, we study how the mining parameters affect the dissimilarity assessment process. For the case of decision tree patterns, we propose a framework that evaluates dissimilarity between both decision trees and classification datasets. Finally, for the case of clusters, we propose dissimilarity measures between clusters and clusterings, which we then employ for change detection in dynamic populations. All the above were studied under the consideration that patterns (of any type) are composed of a structure and a measure component, which opens a field towards a unified model for KDD results.