Μελέτη νέων τεχνολογιών για τη διαχείριση μεγάλου όγκου ροών δεδομένων
View/ Open
Abstract
This paper implements a realtime pipeline, which depicts the momentum of the two candidates for the US presidency, using some of the most popular big data technologies such as Apache Spark, Streaming, Kafka and the ELK Stack (Elasticsearch, Logstash and Kibana).
More specifically, as shown in the picture 1, two python producers have been implemented who generate fake proposals and send them to our Kafka Cluster. Then our Spark infrastructure reads in the form of a stream from the Kafka Cluster the proposals concerning the 2 candidates and implements sentiment analysis to determine if the proposals are positive, negative or neutral. Once the sentiment analysis is implemented and after we have all the data we need and how we need it, Spark writes the results to the Kafka Cluster. Finally, with the help of Logstash we transfer our data from the Kafka Cluster to Elasticsearch with a final destination in Kibana to visualize the results by creating the appropriate diagrams.