Distributed stream and event processing pipeline in serverless architecture
Master Thesis
Συγγραφέας
Fotiadis, Orestis
Φωτιάδης, Ορέστης
Ημερομηνία
2021-06Επιβλέπων
Kyriazis, DimosthenisΚυριαζής, Δημοσθένης
Προβολή/ Άνοιγμα
Λέξεις κλειδιά
Big data ; Cloud computing ; Stream analytics ; Real-time processingΠερίληψη
The increasing interconnection of our world over the last decade has led to an exponential growth of data that are emitted from personal devices, IoT sensors and other activities of our society. As a result, a very large amount of this data is produced in the form of continuous streams. Data stream processing in real-time has become a crucial operation for several business domains, however processing a large amount of data often from different sources still represents a challenge both technologically and operationally. These needs have led to the emergence of open source and commercial systems that aim to manage and analyze data streams. These systems are of continuing importance as the constant multiplication of data stream sources increases.
This dissertation presents the theoretical background on data processing and the two prevalent architectures for stream processing: The Lambda and Kappa architecture. We also present popular open-source and commercial products on the wider data streaming domain. In the second part of the dissertation, we present the design and implementation of a stream analytics pipeline built on top of Microsoft’s Azure cloud platform. We use the Twitter API to stream and analyze data as well as to execute benchmarks to examine the solution’s performance. Finally, we present the solution’s artifacts as deployable templates.