The widespread occurrence of mobile malware still poses a significant security threat to billions of smartphone users. To counter this threat, several machine learning-based detection systems have been proposed within the last decade. These methods have achieved impressive detection results in many settings, without requiring the manual crafting of signatures. Unfortunately, recent research has demonstrated that these systems often suffer from significant performance drops over time if the underlying distribution changes---a phenomenon referred to as concept drift. So far, however, it is still an open question which main factors cause the drift in the data and, in turn, the drop in performance of current detection systems.
To address this question, we present a framework for the in-depth analysis of dataset affected by concept drift. The framework allows gaining a better understanding of the root causes of concept drift, a fundamental stepping stone for building robust detection methods. To examine the effectiveness of our framework, we use it to analyze a commonly used dataset for Android malware detection as a first case study. Our analysis yields two key insights into the drift that affects several state-of-the-art methods. First, we find that most of the performance drop can be explained by the rise of two malware families in the dataset. Second, we can determine how the evolution of certain malware families and even goodware samples affects the classifier's performance. Our findings provide a novel perspective on previous evaluations conducted using this dataset and, at the same time, show the potential of the proposed framework to obtain a better understanding of concept drift in mobile malware and related settings.
This seminar will be a dry-run of the talk to be given at the AISec workshop 2023, co-located with ACM CCS.
Theo Chow is a dedicated PhD candidate under the guidance of Professor Fabio Pierazzi. He is an active member of the Cyber Security Group within the Department of Informatics at King’s College London. Prior to embarking on his doctoral journey at King's, Theo completed his Master of Science (MSc) in Advanced Microelectronics and Computer Systems at the University of Bristol, following a Bachelor of Engineering (BEng) in Electronics Engineering at the University of Warwick.
Theo's research passion lies at the intersection of eXplainable AI (XAI), Cybersecurity, Concept Drift, and Machine Learning Model Robustness. His work addresses the growing concerns surrounding the reliability of Machine Learning models and delves into how XAI can offer solutions. He is dedicated to demystifying the 'black box' nature of these models, ultimately empowering practitioners to understand and trust these increasingly influential systems.