AI Series Part 5: How to Hack a Machine Learning System 

AI Series Part 5: How to Hack a Machine Learning System 

Machine Learning

This is the fifth post in a series discussing artificial intelligence and how the increased use of AI impacts modern life.  In the previous posts in this series, we discussed different applications of AI and their repercussions.  In this post, we start looking at the security of AI and discuss how AI can be hacked. 

Machine Learning is Easily Manipulated 

In previous posts in this series, we discussed how machine learning algorithms can unintentionally go wrong.  Whether through poorly-designed scoring algorithms or the inclusion of implicit biases, these algorithms fail to properly do their job. 

If AI can be unintentionally broken, it shouldn’t be surprising that a determined adversary can manipulate it to their own advantage.  However, while the applications of AI get a lot of press, the same cannot be said regarding their limitations. 

In fact, a recent presentation by a Microsoft employee at USENIX on the subject demonstrated that many organizations do not red team their machine learning models.  However, there are multiple ways to do so. 

Corrupted Training Data 

Machine learning algorithms are designed to be programs that learn.  They take a set of data – whether curated training data or “real world” data – and build a model based upon it.  This model can then be used to perform classification of other data and for making decisions based upon these classifications. 

When working with machine learning systems, the quality of the training data is critical to building a high-quality model.  A machine learning system’s model is derived from the data that it sees in its training dataset and any feedback that it receives during “live” training. 

This reliance on a high-quality dataset makes machine learning systems vulnerable to external interference.  Depending on the training method used, a machine learning system’s internal model can be attacked in a couple of ways. 

Training Data Modification 

For machine learning systems trained on a corpus of labeled data, the attacker can degrade the quality of the model by modifying the training data.  By inserting malicious events labeled as benign or relabeling existing “attack” data as benign, the attacker can teach the machine learning system to ignore these events.  When the system goes into production, the attacker will be able to perform these specific types of attack without detection. 

“Low and Slow” Attacks 

Some machine learning systems detect “attacks” based on anomalies that differ significantly from the norm.  These systems will often build a baseline model over time by learning to accept anything that is not too anomalous and reject anything that is. 

An attacker can influence these machine learning systems by causing small, steady changes to their internal models.  An attack that only differs slightly from the norm may be accepted and – after it has occurred enough times – be officially considered “legitimate”.  At this point, the attacker can make additional changes to continue to undermine the system and teach it to accept and ignore any malicious events. 

Adversarial Machine Learning 

Machine learning algorithms are designed to build a model of a certain system.  They observe certain data and base their classifications and decisions on those observations. 

The problem with machine learning algorithms is that they are systems with observable inputs and outputs as well.  This means that, given a machine learning algorithm, it is always possible to develop another machine learning algorithm that is designed to deceive or trick the first algorithm. 

For example, take the current LinkedIn content scoring algorithm.  LinkedIn promotes posts that have high user engagement under the assumption that these posts are valuable, insightful, etc.  One of the measures that LinkedIn uses for measuring engagement is whether or not people click on the “See more” button to see the rest of a post that overflows the initial view.  The logic behind this is that LinkedIn can only tell if a user actually read a post if they engage in it in some way.  While a user may not be inclined to leave a comment or respond to a post (especially since LinkedIn responses are largely “positive” and an “engaged” user may not feel positively about a post), clicking “See more” is a required step to see a long post. 

Since this engagement algorithm is public knowledge, people have started optimizing their LinkedIn posts to maximize the number of times readers will click “See more.”  That’s why many LinkedIn posts are set up with empty lines and maybe only a single leading statement visible by default.  Curiosity causes a user to click “See more” on the post even if the post provides little or no real value.  As a result, LinkedIn is more annoying to read and those who game the system likely have greater reach than those who post valuable content but choose not to play. 

If humans can figure out this algorithm and optimize their LinkedIn posts accordingly, a machine learning algorithm certainly can.  While LinkedIn post optimization is a relatively benign and harmless application of adversarial machine learning, applying the same approach to AI used in cybersecurity and similar contexts has much more threatening implications. 

Securing Machine Learning 

Machine learning algorithms can be an invaluable tool with a wide variety of different applications.  The ability to extract patterns and derive models from data is useful in multiple industries. 

However, it is important to consider the fact that machine learning algorithms can be attacked, rendering the models that they create and the classifications that they make untrustworthy.  The next blog in this series discusses the challenge of securing AI. 

Blog Posts

Karen Huggins

Chief Financial, HR and Admin Officer
Divider
Karen joined the Netragard team in 2017 and oversees Netragard’s financial, human resources as well as administration functions. She also provides project management support to the operations and overall strategy of Netragard.
 
Prior to joining Netragard, she worked at RBC Investor Services Bank in Luxembourg in the role of Financial Advisor to the Global CIO of Investor Services, as well as several years managing the Financial Risk team to develop and implement new processes in line with regulatory requirements around their supplier services/cost and to minimize the residual risk to the organization.
 
With over 20 years of experience in finance with global organizations, she brings new perspective that will help the organization become more efficient as a team. She received her Bachelor of Finance from The Florida State University in the US and her Master of Business Administration at ESSEC Business School in Paris, France.

Philippe Caturegli

Chief Hacking Officer
Divider
Philippe has over 20 years of experience in information security. Prior to joining Netragard, Philippe was a Senior Manager within the Information & Technology Risk practice at Deloitte Luxembourg where he led a team in charge of Security & Privacy engagements.

Philippe has over 10 years of experience in the banking and financial sector that includes security assessment of large and complex infrastructures and penetration testing of data & voice networks, operating systems, middleware and web applications in Europe, US and Middle East.

Previously, Philippe held roles within the information system security department of a global pharmaceutical company in London. While working with a heterogeneous network of over 100,000 users across the world and strict regulatory requirements, Philippe gained hands-on experience with various security technologies (VPN, Network and Application Firewalls, IDS, IPS, Host Intrusion Prevention, etc.)

Philippe actively participates in the Information Security community. He has discovered and published several security vulnerabilities in leading products such as Cisco, Symantec and Hewlett-Packard.

He is a Certified Information Systems Security Professional (CISSP), Certified Ethical Hacker (CEH), PCI Qualified Security Assessors (PCI-QSA), OSSTMM Professional Security Analyst (OPSA), OSSTMM Professional Security Tester (OPST), Certified in Risk and Information Systems Control (CRISC)and Associate Member of the Business Continuity Institute (AMBCI).

Adriel Desautels

Chief Technology Officer
Divider
Adriel T. Desautels, has over 20 years of professional experience in information security. In 1998, Adriel founded Secure Network Operations, Inc. which was home to the SNOsoft Research Team. SNOsoft gained worldwide recognition for its vulnerability research work which played a pivotal role in helping to establish today’s best practices for responsible disclosure. While running SNOsoft, Adriel created the zeroday Exploit Acquisition Program (“EAP”), which was transferred to, and continued to operate under Netragard.
 
In 2006, Adriel founded Netragard on the premise of delivering high-quality Realistic Threat Penetration Testing services, known today as Red Teaming. Adriel continues to act as a primary architect behind Netragard’s services, created and manages Netragard’s 0-day Exploit Acquisition Program and continues to be an advocate for ethical 0-day research, use and sales.
 
Adriel is frequently interviewed as a subject matter expert by media outlets that include, Forbes, The Economist, Bloomberg, Ars Technica, Gizmodo, and The Register. Adriel is often an invited keynote or panelist at events such as Blackhat USA, InfoSec World, VICELAND Cyberwar, BSides, and NAW Billion Dollar CIO Roundtable.