H λογιστική παλινδρόμηση για γεγονότα με χαμηλή συχνότητα εμφάνισης
Logistic regression in rare events data
The purpose of this paper is to describe the statistical problem of estimating rare events by Logistic Regression. Rare events are the events that occur with low frequency (less than 5%). First, the basic concepts of the logistic regression are described and the main problems of the statistical analysis of such data are analyzed. These problems are the inefficient common used strategies for collecting data with rare events as well as the difficulty in explaining and predicting. Next the correction methods proposed by King Gary and Langche Zeng (2001) are explained in details and their efficiency in reducing the biased are compared with the help of simulated data. Finally, a real dataset that refers to diabetes were used in order to be more clear that the corrections improve the estimations.