Στατιστικές τεχνικές παλινδρόμησης για την ανάλυση μεγάλων δεδομένων
Regression techniques for the analysis of big data
View/ Open
Keywords
Ανάλυση παλινδρόμησης ; Γραμμική παλινδρόμηση ; Μεγάλα δεδομένα ; Στατιστική ανάλυση ; Linear regression ; Big dataAbstract
In modern times, the collection of large amounts of data which usually involves a
lot of features has led to the need for the development of specific statistical techniques
that will help studying their structure and extracting useful conclusions for them. It
has been found that, for such an analysis it is required either a special adaptation of
available conventional statistical techniques or the development of alternative
techniques.
In this paper we will provide a brief presentation of techniques used for the analysis
of large amounts of data, and describe the respective algorithms along with the codes
in R environment. In addition a comparison study of those techniques on real data is
carried out.
Specifically there will be used tree based regression techniques which are improved
by exploiting machine learning algorithms. Furthermore there be used techniques
based on classical linear regression model, in which a restriction (penalty) is applied
on the model coefficients in order to address the phenomenon of multicollinearity,
reduce the variance of the coefficients and achieve greater prediction accuracy.