Διατύπωση μετρικών κατανομής πόρων και βέλτιστου τρόπου εκτέλεσης κατανεμημένων αλγορίθμων μηχανικής μάθησης
Specification of resource allocation metrics and optimized execution patterns of distributed machine learning algorithms
View/ Open
Abstract
Big Data is usually defined by three characteristics called the three Vs (Volume, Velocity and Variety). It refers to Data that is very large, dynamic and complex. In this context, the data is difficult to be recorded, saved, processed and analyzed using traditional data processing applications. Consequently, the new conditions imposed upon us by Big Data present serious challenges on a different level, including data clustering. In general, Big Data assembling techniques can be classified into two categories; single machine clustering techniques and multiple machine clustering techniques. This particular thesis aims to examine the behavior of classification algorithms, which are applied on the analysis of large volumes of data in different technologies – cloud architectures. Cloud computing is a powerful technology for the execution of multiple and complex calculations. It eliminates the need to sustain expensive IT material, exclusive space and software. The massive increase of the data scale or the big data that is produced through cloud computing is a time consuming task that demands a large computational infrastructure for successful data processing and analysis. The purpose of classification algorithms is the understanding and extraction of values from large arrays of structured or unstructured data. In large volumes of unstructured data, it only makes sense to try and separate the data into logical clusters, before analyzing. In that way, classification allow us to have a mass perspective and to form some logical structures before continuing to analyze. Next, we will be presenting the most popular algorithms, which are broadly used to solve classification problems. In order to ensure that final conclusions are safe and objective, we will use a common dataset to run each algorithm separately. The Cloud needs to be distributed and programmed in such a way that providers achieve their goals and users meet the requirements of their applications minimum costs. The name we call this as a cloud resource allocation problem. Resource allocation is traditionally viewed as an optimization problem, so resource allocation is NP-Hard. limited resources are available and allocated own resources in competitive circumstances / activities so that both parties will achieve their goals. This extensive review aims to process and analyze the numerous resolve / spam issues in cloud resource allocation.