Σύγκριση τεχνικών διαχωριστικής ανάλυσης
Comparison of discrimination techniques
View/ Open
Keywords
Εποπτευόμενη μάθηση ; Διαχωριστική ανάλυση ; Discriminant methods ; Classification ; Supervised learningAbstract
The field of Discriminant Analysis is of particular interest, both at the research scientific area, as well as, the direction of practical application in the field of business, in a wide range of activities. The main purpose of the discrimination process is to classify the experimental units under study, to one of many known populations, based on the values of specific observed characteristics. The above is made possible through the formation of an appropriate discrimination rule, based on which each experimental unit is classified, in one of the above mentioned available populations.
Along the specific diploma thesis, the theoretical framework of the Polynomial Logistic Regression method, as well as the ID3, C4_5 and CART Algorithms, is thoroughly presented. Next, specific examples are given of each method, in order to make it easier to understand their concepts of foundation. Finally, extensive use of the Polynomial Logistic Regression method and the C 4_5 algorithm is performed, in a primary set of data, so as to separate taxpayers of a given country, in terms of their marital status, which is particular useful to the tax authorities, for tax administration of non-residents of a foreign nationality. The efficiency and separation accuracy of the two methods achieved, is judged to be entirely comparable to data sets, which are complete on the characteristics under consideration, without the occurrence of missing values, whereas in the case of incorporation of features with a high percentage of missing values and subsequent management of them, the C 4_5 algorithm appeared to be more robust, showing better separation accuracy than that of the Polynomial Logistic Regression.