Εμφάνιση απλής εγγραφής

dc.contributor.advisorΠρέντζα, Ανδριάνα
dc.contributor.advisorPeris, Ricardo Jimenez
dc.contributor.authorΚιούρτης, Αθανάσιος
dc.date.accessioned2015-09-26T11:29:17Z
dc.date.available2015-09-26T11:29:17Z
dc.date.issued2015-02-13
dc.identifier.urihttps://dione.lib.unipi.gr/xmlui/handle/unipi/7821
dc.format.extent120el
dc.language.isoenel
dc.publisherΠανεπιστήμιο Πειραιώςel
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Διεθνές*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectData miningel
dc.subjectDatabase managementel
dc.subjectElectronic data processing -- Distributed processingel
dc.subjectΕξόρυξη δεδομένωνel
dc.subjectΒάσεις δεδομένων -- Διαχείρισηel
dc.subjectΗλεκτρονική επεξεργασία δεδομένωνel
dc.titleParallel Fuzzy K-Means clustering on top of an OLAP databaseel
dc.typeMaster Thesisel
dc.contributor.departmentΣχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Ψηφιακών Συστημάτωνel
dc.identifier.call005.7422 ΚΙΟel
dc.description.abstractENThe era we live in is attacked by the big data phenomenon. Multiple enterprises store large amounts of data for analysis, making the field of data analysis more and important. But how is someone able to gain insights from this massively evolving data? There comes the data mining field, which is what this thesis investigates, and especially the clustering field, whose algorithms can group unknown data into multiple clusters. Needing to perform clustering in a distributed way, the framework of Hadoop is analyzed, which offers a way to execute parallel programs on a cluster of machines, storing the data in a distributed file system. The parallel executions are achieved using the MapReduce paradigm which transforms complex computations over a set of <key, value> pairs, so that many jobs can run together. As for the distributed storing, it is achieved using the Hadoop distributed file system (HDFS), a file system that provides scalable and reliable data storage. One of the projects that uses Hadoop for performing clustering, classification and collaborative-filtering techniques to large data, is Mahout. Among all the clustering algorithms, the thesis gives details about a Fuzzy clustering algorithm, Fuzzy kMeans, which groups the data to clusters, by assigning to each data point different degrees of association for each cluster. The main problem that has to be solved is how we can cluster data stored in an OLAP database, in a parallel way. For that reason, it is studied how Mahout implements the fuzzy kMeans algorithm using the Hadoop's components, and after that, the thesis proposes a "similar approach" of the algorithm, running on top of an OLAP database. In more details, the new fuzzy kMeans implementation instead of MapReduce is using a three-stepped <key, value> pair idea (Map, Reduce & FinalReduce). In the first step, the clustering jobs are split and assigned to different threads which export their own <key, value> pairs. In the second step, each thread, without waiting for the other threads to finish their first step, continues using the extracted <key, value> pairs from its previous step and exports its own <key, values>. The last step, takes place when all of the threads finish the aforementioned two steps, and the results of the second step are merged together, to produce the final clustering result. In addition, instead of using the HDFS for storing purposes, the implementation is making use of an OLAP database. For the previous idea, a prototype implementation has been developed and preliminary tests were run, comparing the different fuzzy kMeans clustering implementations. Finally, the thesis concludes with the fact that more attention should be given to the data that evolve day by day, as mining and extracting information out of it, becomes more and more complicated, whereas it could be considered as a very challenging and attractive job.el
dc.corporate.nameUniversidad Politecnica de Madrid. Escuela Tecnica Superior de Ingenieros Informaticosel
dc.contributor.masterΨηφιακά Συστήματα και Υπηρεσίεςel


Αρχεία σε αυτό το τεκμήριο

Thumbnail

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Εμφάνιση απλής εγγραφής

Attribution-NonCommercial-NoDerivatives 4.0 Διεθνές
Εκτός από όπου διευκρινίζεται διαφορετικά, το τεκμήριο διανέμεται με την ακόλουθη άδεια:
Attribution-NonCommercial-NoDerivatives 4.0 Διεθνές

Βιβλιοθήκη Πανεπιστημίου Πειραιώς
Επικοινωνήστε μαζί μας
Στείλτε μας τα σχόλιά σας
Created by ELiDOC
Η δημιουργία κι ο εμπλουτισμός του Ιδρυματικού Αποθετηρίου "Διώνη", έγιναν στο πλαίσιο του Έργου «Υπηρεσία Ιδρυματικού Αποθετηρίου και Ψηφιακής Βιβλιοθήκης» της πράξης «Ψηφιακές υπηρεσίες ανοιχτής πρόσβασης της βιβλιοθήκης του Πανεπιστημίου Πειραιώς»