How to Train Your AI to Mis-Identify Dragons

This week Skylight Cyber disclosed that they were able to fool a popular “AI”-based Endpoint Protection (EPP) solution into incorrectly marking malware as safe. While trying to reverse-engineer the details of the solution's Machine Learning (ML) engine, the researchers found that it contained a secondary ML model added specifically to whitelist certain types of software like popular games. Supposedly, it was added to reduce the number of false positives their "main engine" was producing. By dumping all strings contained in such a whitelisted application and simply appending them to the end of a known piece of malware, the researchers were able to avoid its detection completely, as shown in their demo video.

This finding is just another confirmation of inherent challenges of designing ML-based cybersecurity products. Here are some issues:

The advantages that ML-enhanced cybersecurity tools provide can be easily defeated if overrides are used to eliminate false positives rather than proper training of ML algorithms. ML works best when fed as much data as possible, and when products are implemented using the right combination of supervised and unsupervised ML methods. It’s possible that whitelisting would not have been necessary if sufficient algorithmic training had been performed.
ML can be gamed. Constraining data sets or simply not having enough data piped through the appropriate mix of ML algorithms can lead to bias, which can lead to missed detections in the cybersecurity realm. This can be either intentional or unintentional. In cases of intentional gaming, malicious actors select subsets of data with which to train the discriminator, while purposely omitting others. In the unintentional case, software developers may not have access to a full sample set or may simply choose to not use a full sample set during the construction of the model.
Single-engine EPP products are at a disadvantage compared to multi-engine products. Using “AI” techniques in cybersecurity, especially in EPP products, is an absolute necessity. With millions of new malware variants appearing monthly, human analysts can’t analyze and build signatures fast enough. It is infeasible to rely on signature-based AV alone, and this has been true for years. However, just because signature-based engines are not completely effective doesn’t mean that products should abandon that method in favor of a different single method. The best endpoint protection strategy is to use a mixture of techniques, including signatures, ML-enhanced heuristics, behavioral analysis, sandboxing, exploit prevention, memory analysis, and micro-virtualization. Even with an assortment of malware detection/prevention engines, EPP products will occasionally miss a piece of malicious code. For those rare cases, most endpoint security suite vendors have Endpoint Detection & Response (EDR) products to look for signs of compromise.
Marketing ML-enhanced tools as an “AI” panacea has drawbacks. ML tools are a commodity now. ML techniques are used in many cybersecurity tools, not just EPP. ML is in most data analytics programs as well. It’s a necessary component to deal with enormous volumes of data in most applications. The use of the term “AI” in marketing today suggests infallibility and internal self-sufficiency. But such tools can make mistakes, and they don’t eliminate the need for human analysts.

KuppingerCole is hosting an AImpact Summit in Munich in November where we’ll tackle these very issues. The Call for speakers is open.

For an in-depth comparison of EPP vendors, see our Leadership Compass on Endpoint Security: Anti-Malware.

Like this?

Don't like this?

How to Train Your AI to Mis-Identify Dragons