Στατιστικές τεχνικές παλινδρόμησης για την ανάλυση μεγάλων δεδομένων
Regression techniques for the analysis of big data
Σταθόπουλος, Γεώργιος Α.
KeywordsΑνάλυση παλινδρόμησης ; Γραμμική παλινδρόμηση ; Μεγάλα δεδομένα ; Στατιστική ανάλυση ; Linear regression ; Big data
In modern times, the collection of large amounts of data which usually involves a lot of features has led to the need for the development of specific statistical techniques that will help studying their structure and extracting useful conclusions for them. It has been found that, for such an analysis it is required either a special adaptation of available conventional statistical techniques or the development of alternative techniques. In this paper we will provide a brief presentation of techniques used for the analysis of large amounts of data, and describe the respective algorithms along with the codes in R environment. In addition a comparison study of those techniques on real data is carried out. Specifically there will be used tree based regression techniques which are improved by exploiting machine learning algorithms. Furthermore there be used techniques based on classical linear regression model, in which a restriction (penalty) is applied on the model coefficients in order to address the phenomenon of multicollinearity, reduce the variance of the coefficients and achieve greater prediction accuracy.