Comparative analysis of SQL queries performance on vehicle sensor data in RDBMS and Apache Spark
Συγκριτική ανάλυση της απόδοσης ερωτημάτων SQL μεταξύ ΣΔΒΔ και Apache Spark για δεδομένα οχημάτων

Master Thesis
Author
Κουτσιμπογιώργος, Γρηγόριος
Koutsimpogiorgos, Grigorios
Date
2023-09View/Open
Keywords
Apache Spark ; RDBMS ; Big dataAbstract
In today's digital era, the exponential growth in data volume, variety, and velocity has necessitated the exploration of advanced techniques for storing and analyzing big data. The continuous improvement of hardware has led to the development of new technologies for data storage and processing by expanding the traditional data storages technologies and analysis frameworks. Many organizations are turning to distributed computing frameworks to process and analyze large datasets.
One of the most popular technologies in this field is Apache Spark, a fast and general-purpose cluster computing system. However, traditional relational databases, such as Oracle, are still widely used for data storage and retrieval. In this thesis we are comparing the performance of a specific set of queries, on a Vehicle Sensor Dataset, executed on both a traditional RDBMS system as well as on Apache spark. Our goal is to determine whether modern technologies can perform as well or even better than relational databases when it comes to processing and analyzing large data sets like in our case. Additionally, the thesis explores the optimization techniques that can be used to improve the performance of Spark and Oracle.