Ομαδοποίηση μεγάλης κλίμακας δεδομένων στην Πλατφόρμα Spark

Αυδάλας, Στέφανος

Large-scale data clustering in Spark

Master Thesis

Author

Αυδάλας, Στέφανος

Date

2018-06

Abstract

The modern age we live in is characterized as a ”Big Data” era due to the increase in daily produced data. These data are now the basic source of mining knowledge. Recent estimates indicate that the volume of data produced every two days is equal to the number of data created since the beginning of mankind until 2003. For the analysis of these data, traditional data analysis tools are not sufficient for such processes. New tools for analyzing large-scale data are thus constantly being developed. Based on these needs, the present dissertation deals with the Large Data Clustering on the Spark Platform. In the first chapter the reader is introduced into the concept of Large-Scale Data. More specifically, we present their development and challenges, as well as the ways that they can be produced and acquired. The second chapter introduces the concept of Clustering. We present the distance and similarity measures for each type of data, as well as the measures used for clusters. The categories of clustering algorithms are listed below, as well as the categories used for clustering large scale data. In the next chapter we present the Spark platform, which is widely used for large-scale data analysis. More specifically, the components of the platform as well as its various libraries are presented, including MLlib and PySpark, which are used for the analysis in this dissertation. The last chapter describes and compares the results of the clustering algorithms found in the MLlib library through various evaluation measures calculated for each case.

Postgraduate Studies Programme

Εφαρμοσμένη Στατιστική

Department

Σχολή Χρηματοοικονομικής και Στατιστικής. Τμήμα Στατιστικής και Ασφαλιστικής Επιστήμης

Number of pages

122

Language

Greek

URI

https://dione.lib.unipi.gr/xmlui/handle/unipi/11390

Collections

Τμήμα Στατιστικής και Ασφαλιστικής Επιστήμης

Show full item record

Except where otherwise noted, this item's license is described as
Attribution-NonCommercial-NoDerivatives 4.0 Διεθνές