Multiple layer hybrid classification for Android malware detection
Υβριδική ταξινόμηση πολλαπλών επιπέδων για ανίχνευση κακόβουλου λογισμικού Android
Master Thesis
Συγγραφέας
Ανυφαντάκης, Κωνσταντίνος
Anyfantakis, Konstantinos
Ημερομηνία
2021-06Επιβλέπων
Ξενάκης, ΧρήστοςXenakis, Christos
Προβολή/ Άνοιγμα
Περίληψη
Because of the ever-increasing number of mobile devices running the Android operating system, as well as their widespread use and diverse application capabilities, such devices have become lucrative targets for malicious apps. Despite mitigating attempts, mobile malware has begun to flourish at an alarming rate. Because Android is an open platform that is fast dominating other rival systems in the mobile smart device industry, this has become much more prominent. Experts acquire significant insights into the mechanics of malware using powerful static and dynamic analysis, and machine learning is frequently used to discover unknown harmful software. Nevertheless, the Android operating system, as well as malware associated with it, is always changing. As a result, training a machine learning model with obsolete malware may have a detrimental impact on the predicted detection of more recent malware, so one of the side goals of this thesis is introducing the Omnidroid dataset and the usage of AndroPyTool. Apart from that, a new wave of Android malware groups has recently developed that have excellent evasive capabilities, making them far harder to identify using traditional approaches. Various malware detection approaches based on static, dynamic, and hybrid analysis have recently been proposed to make Android devices increasingly safe, however with the growing evolution of malware these methods are nowadays ineffective and imprecise. This thesis not only demonstrates how to employ unique parallel classifiers forming stacked ensemble models to identify zero-day Android malware, but it also discusses how this type of models helps improving malware detection using it on both types of features (static features obtained from static analysis and dynamic from dynamic analysis). On top of that, the suggested approach attempts to fuse the results from these two types, being classified on their own, to aggregate attributes from parallel classifiers using as an example a soft-voting ensemble. The final prediction accuracy on the given dataset was found to be around 91%.