Clustering algorithm selection by meta-learning
View/ Open
Keywords
Clustering ; Data characterization ; Meta-learning ; Algorithm ranking ; Algorithm selection ; Meta-knowledgeAbstract
Data clustering attempts to classify a database into object groups based on the similarities between the objects in question. The quest for a good-quality solution can become a complex process because of its unsupervised existence. There is currently a wide range of clustering algorithms, and it can be a slow and expensive process to select the best one for a given problem. For every dataset that is related to clustering problems, there is an exhaustive procedure that requests from a Data Scientist firstly to test each clustering algorithm to find the most suitable one. A system that recommends the clustering algorithm and guides the user for selecting the right one would be a great tool that would provide significant benefits to the scientific community. Rice formulated the Algorithm Selection Problem (ASP) in 1976, which postulates that the output of the algorithm can be predicted based on the structural features of the problem. Meta-learning has been used successfully for recommendation tasks with algorithms. It uses machine learning to induce meta-models capable of predicting the best algorithm of a new dataset. Experimental results show that the recommendation improves with these meta-attributes. With a significant accuracy, it is presented that a system could indeed recommend a clustering algorithm for an “unknown” dataset only by examining its meta-attributes firstly. Also, this Master Thesis discusses the relevance to the recommendation of each meta-feature.