AI Series Part 5: How to Hack a Machine Learning System –This is the fifth post in a series discussing artificial intelligence and how the increased use of AI impacts modern life. In the previous posts in this series, we discussed different applications of AI and their repercussions. In this post, we start looking at the security of AI and discuss how AI can be hacked.
How to Hack a Machine Learning System
In previous posts in this series, we discussed how machine learning algorithms can unintentionally go wrong. Whether through poorly-designed scoring algorithms or the inclusion of implicit biases, these algorithms fail to properly do their job.
If AI can be unintentionally broken, it shouldn’t be surprising that a determined adversary can manipulate it to their own advantage. However, while the applications of AI get a lot of press, the same cannot be said regarding their limitations.
In fact, a recent presentation by a Microsoft employee at USENIX on the subject demonstrated that many organizations do not red team their machine learning models. However, there are multiple ways to do so.
Corrupted Training Data
Machine learning algorithms are designed to be programs that learn. They take a set of data – whether curated training data or “real world” data – and build a model based upon it. This model can then be used to perform classification of other data and for making decisions based upon these classifications.
When working with machine learning systems, the quality of the training data is critical to building a high-quality model. A machine learning system’s model is derived from the data that it sees in its training dataset and any feedback that it receives during “live” training.
This reliance on a high-quality dataset makes machine learning systems vulnerable to external interference. Depending on the training method used, a machine learning system’s internal model can be attacked in a couple of ways.
Training Data Modification
For machine learning systems trained on a corpus of labeled data, the attacker can degrade the quality of the model by modifying the training data. By inserting malicious events labeled as benign or relabeling existing “attack” data as benign, the attacker can teach the machine learning system to ignore these events. When the system goes into production, the attacker will be able to perform these specific types of attack without detection.
“Low and Slow” Attacks
Some machine learning systems detect “attacks” based on anomalies that differ significantly from the norm. These systems will often build a baseline model over time by learning to accept anything that is not too anomalous and reject anything that is.
An attacker can influence these machine learning systems by causing small, steady changes to their internal models. An attack that only differs slightly from the norm may be accepted and – after it has occurred enough times – be officially considered “legitimate”. At this point, the attacker can make additional changes to continue to undermine the system and teach it to accept and ignore any malicious events.
Adversarial Machine Learning
Machine learning algorithms are designed to build a model of a certain system. They observe certain data and base their classifications and decisions on those observations.
The problem with machine learning algorithms is that they are systems with observable inputs and outputs as well. This means that, given a machine learning algorithm, it is always possible to develop another machine learning algorithm that is designed to deceive or trick the first algorithm.
For example, take the current LinkedIn content scoring algorithm. LinkedIn promotes posts that have high user engagement under the assumption that these posts are valuable, insightful, etc. One of the measures that LinkedIn uses for measuring engagement is whether or not people click on the “See more” button to see the rest of a post that overflows the initial view. The logic behind this is that LinkedIn can only tell if a user actually read a post if they engage in it in some way. While a user may not be inclined to leave a comment or respond to a post (especially since LinkedIn responses are largely “positive” and an “engaged” user may not feel positively about a post), clicking “See more” is a required step to see a long post.
Since this engagement algorithm is public knowledge, people have started optimizing their LinkedIn posts to maximize the number of times readers will click “See more.” That’s why many LinkedIn posts are set up with empty lines and maybe only a single leading statement visible by default. Curiosity causes a user to click “See more” on the post even if the post provides little or no real value. As a result, LinkedIn is more annoying to read and those who game the system likely have greater reach than those who post valuable content but choose not to play.
If humans can figure out this algorithm and optimize their LinkedIn posts accordingly, a machine learning algorithm certainly can. While LinkedIn post optimization is a relatively benign and harmless application of adversarial machine learning, applying the same approach to AI used in cybersecurity and similar contexts has much more threatening implications.
Securing Machine Learning
Machine learning algorithms can be an invaluable tool with a wide variety of different applications. The ability to extract patterns and derive models from data is useful in multiple industries.
However, it is important to consider the fact that machine learning algorithms can be attacked, rendering the models that they create and the classifications that they make untrustworthy. We hope you’ve enjoyed “AI Series Part 5: How to Hack a Machine Learning System”! The next blog in this series discusses the challenge of securing AI.