Study of technologies / research systems for big scientific data analytics
At this paper, are studied the existing technologies and the research systems, which are dedicated for the analysis of huge amounts of data, also known and often referred with the words “Big Data”. These large datasets could be, recorded data, or streaming data from Internet of Things devices that need to get analyzed with an upper purpose, such as the derivation of knowledge, in terms of learning a behavior. At first, the focus is on the everyday rapid generation of data and how it can be managed by taking advantage of the Big Data analytics systems. The Apache Hadoop Java framework is the benchmark of these systems. Afterwards, there is a review of the existing technologies that are utilized and exploited for the processing of Big Data. The most of these technologies can integrate with the Hadoop framework and produce an enriched Big Data analytics ecosystem. In the next chapter, it is described the flexibility offered by a Hadoop cluster, in terms of adding nodes, in order to empower the distributed processing. Then, it is presented the installation procedure that should be followed, so as to create a Hadoop cluster, integrated with several components/technologies. Subsequently, there are a number of experiments that examine and evaluate the performance of a Hadoop cluster and some components of the Hadoop ecosystem, according to the number of the active data nodes. Finally, all the observation, conclusions and comments are concentrated in the last chapter, with thoughts for future exploitations of Big Data analytics systems.