Εξόρυξη δεδομένων σε ανοιχτά διασυνδεδεμένα δεδομένα
Data Mining in open linked data
KeywordsΣημασιολογικός Ιστός ; Ανοικτά δεδομένα ; Εξόρυξη δεδομένων ; Ανοιχτά διασυνδεδεμένα δεδομένα ; Αλγόριθμοι
The purpose of the present thesis is twofold. On the one hand, our goal is to take advantage of the potential of the linked open data and of the information that they could provide. On the other hand the application of data mining techniques on linked data with a view to discover the hidden knowledge in them. The ever increasing use of Internet, has without doubt converted the World Wide Web into the largest data and information storage. The contribution of the linked open data is the link and the publication of structured information on the Web, so that they can be understood by the computational engines via the Semantic Web. These data are represented by the RDF schema (Resource Description Framework) and SPARQL language for the searchable data in RDF in the Semantic Web. The rapid development of the linked data and their usefulness, has urged governments, public institutions, museums, encyclopedias, libraries etc. to participate in this endeavor. One example is the DBpedia, a project about linking and reusing structured information through Wikipedia under the principles of the linked open data. Through SPARQL queries on DBpedia, we extracted information about 2000 films in order to apply data mining techniques. Data mining using algorithms which are based on statistical and machine learning enable us to analyze and process large databases in order to extract useful information from them. More specifically, we implemented categorization techniques to classify a film as "good" or "bad" based on the film characteristics that we collected. Then, we studied, customized and appropriately evaluated classification algorithms.