Αποδοτική διαχείριση δεδομένων με χρήση πολυδιάστατων ευρετηρίων στο υπολογιστικό νέφος
View/ Open
Subject
Apache Hadoop ; MapReduce ; Electronic data processing -- Distributed processing ; Cloud computingAbstract
This master thesis deals with the management of large amounts of data in the cloud. Specifically it studies the ability of using R-tree indexes for querying multidimensional data stored in the cloud effectively. Initially, cloud computing is presented as well as the multidimensional nature of data and the necessity of using indexes when querying multidimensional data. In the next chapter the Map Reduce programming model is presented as well as its implementation, the Hadoop framework, on which the current approach is based. Related studies are referenced followed by a presentation of the R-tree index structure and a description of range query execution over an R-tree index. The next chapters focus on the design and the implementation of the approach including details of the code developed. Then a description of the environment, where the experimental analysis was carried out, takes place. During the experimental analysis range queries were executed, with the use of an R-tree index and without it, on different input files when it comes to the number of points includes and their dimensions. Finally, the conclusions derived from the experimental study are demonstrated showing that using R-tree indexes indeed improves Hadoop’s query performance on multidimensional data. Suggestions for future research are made.