Current and future applications of artificial intelligence (or should we rather stick to a more appropriate term “Machine Learning”?) in cybersecurity have been one of the hottest discussion topics in recent years. Some experts, especially those employed by anti-malware vendors, see ML-powered malware detection as the ultimate solution to replace all previous-generation security tools. Others are more cautious, seeing great potential in such products, but warning about the inherent challenges of current ML algorithms.

One particularly egregious example of “AI security gone wrong” was covered in an earlier post by my colleague John Tolbert. In short, to reduce the number of false positives produced by an AI-based malware detection engine, developers have added another engine that whitelisted popular software and games. Unfortunately, the second engine worked a bit too well, allowing hackers to mask any malware as innocent code just by appending some strings copied from a whitelisted application.

However, such cases where bold marketing claims contradict not just common sense but the reality itself and thus force engineers to fix their ML model shortcomings with clumsy workarounds, are hopefully not particularly common. However, every ML-based security product does face the same challenge – whenever a particular file triggers a false positive, there is no way to tell the model to just stop it. After all, machine learning is not based on rules, you have to feed the model with lots of training data to gradually guide it to a correct decision and re-labeling just one sample is not enough.

This is exactly the problem the developers of Dolphin Emulator have recently faced: for quite some time, every build of their application has been recognized by Windows Defender as a malware based on Microsoft’s AI-powered behavior analysis. Every time the developers would submit a report to Microsoft, it would be dutifully added to the application whitelist, and the case would be closed. Until the next build with a different file hash is released.

Apparently, the way this cloud-based ML-powered detection engine is designed, there is simply no way to fix a false positive once and for all future builds. However, the company obviously does not want to make the same mistake as Cylance and inadvertently whitelist too much, creating potential false negatives. Thus, the developers and users of the Dolphin Emulator are left with the only option: submit more and more false-positive reports and hope that sooner or later the ML engine will “change its mind” on the issue.

Machine learning enhanced security tools are supposed to eliminate the tedious manual labor by security analysts; however, this issue shows that sometimes just the opposite happens. Antimalware vendors, application developers, and even users must do more work to overcome this ML interpretation problem. Yet, does it really mean that incorporating machine learning into an antivirus was a mistake? Of course not, but giving too much authority to an ML engine which is, in a sense, incapable of explaining its decisions and does not react well to criticism, probably was.

Potential solutions for these shortcomings do exist, the most obvious being the ongoing work on making machine learning models more explainable, giving insights into the ways they are making decisions on particular data samples, instead of just presenting themselves to users as a kind of a black box. However, we’re yet to see commercial solutions based on this research. In the future, a broader approach towards the “artificial intelligence lifecycle” will surely be needed, covering not just developing and debugging models, but stretching from the initial training data management all the way up to ethical and legal implications of AI.

By the way, we’re going to discuss the latest developments and challenges of AI in cybersecurity at our upcoming Cybersecurity Leadership Summit in Berlin. Looking forward to meeting you there! If you want to read up on Artificial Intelligence and Machine Learning, be sure to browse our KC+ research platform.

See also