Μελέτη νέων τεχνολογιών για τη διαχείριση μεγάλου όγκου ροών δεδομένων

Καπότης, Χρήστος

Master Thesis

Author

Καπότης, Χρήστος

Date

2021-02

Abstract

This paper implements a realtime pipeline, which depicts the momentum of the two candidates for the US presidency, using some of the most popular big data technologies such as Apache Spark, Streaming, Kafka and the ELK Stack (Elasticsearch, Logstash and Kibana). More specifically, as shown in the picture 1, two python producers have been implemented who generate fake proposals and send them to our Kafka Cluster. Then our Spark infrastructure reads in the form of a stream from the Kafka Cluster the proposals concerning the 2 candidates and implements sentiment analysis to determine if the proposals are positive, negative or neutral. Once the sentiment analysis is implemented and after we have all the data we need and how we need it, Spark writes the results to the Kafka Cluster. Finally, with the help of Logstash we transfer our data from the Kafka Cluster to Elasticsearch with a final destination in Kibana to visualize the results by creating the appropriate diagrams.

Postgraduate Studies Programme

Πληροφοριακά Συστήματα και Υπηρεσίες

Department

Σχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Ψηφιακών Συστημάτων

Number of pages

Language

Greek

URI

https://dione.lib.unipi.gr/xmlui/handle/unipi/13273
http://dx.doi.org/10.26267/unipi_dione/696

Collections

Τμήμα Ψηφιακών Συστημάτων

Show full item record