In the last post, we discussed an outline of AI powered cyber attacks and their defence strategies. In this post, we will discuss a specific type of attack which is called adversarial attack.
Adversarial attacks are not common now because there are not many deep learning systems in production. But soon, we expect that they will increase. Adversarial attacks are easy to describe. In In 2014, a group of researchers found that by adding a small amount of carefully constructed noise, it was possible to fool CNN/ computer vision. For example, as below, we start with an image of a panda, which is correctly recognised as a “panda” with 57.7% confidence. But by adding the noise, the same image is recognised as a gibbon with 99.3% confidence. For the human eye, both images look the same but for the neural network, the result is entirely different. This type of an attack is called an adversarial attack and it has implications for self driving cars where traffic signs could be spoofed.
Source: Explaining and Harnessing Adversarial Examples, Goodfellow et al, ICLR 2015.
There are three scenarios for this type of attack:
The majority of attacks including the above mentioned takes place in the training phase are carried out by directly altering the dataset to learn, influence, or corrupt the model. Based on the adversarial capabilities, attack tactics are divided into the following categories:
During testing, adversarial attacks do not interact with the intended model, but rather push it to provide inaccurate results. The quantity of knowledge about the model available to the opponent determines the effectiveness of an assault. These assaults are classified as either Whitebox or Blackbox attacks. We present a formal specification of a training procedure for a machine learning model before considering these assaults.
An adversary in a Whitebox assault on a machine learning model has complete knowledge of the model used (for example, the type of neural network and the number of layers). The attacker knows what algorithm was used in training (for example, gradient descent optimization) and has access to the training data distribution. He also understands the whole trained model architecture's parameters . This information is used by the adversary to analyze the feature space in which the model may be vulnerable, i.e., where the model has a high mistake rate. The model is then exploited by modifying an input utilizing the adversarial example creation method, which we'll go through later. A Whitebox assault that has access to internal model weights is equivalent to a very strong adversarial attack.
Black Box Attacks
A Blackbox attack assumes no prior knowledge of the model and exploits it using information about the settings and prior inputs. ‘In an oracle attack, for example, the adversary investigates a model by supplying a series of carefully constructed inputs and observing outputs.
Adversarial learning poses a serious danger to machine learning applications in the real world. Although there are certain countermeasures, none of them can be a one-size-fits-all solution to all problems. The machine learning community still hasn't come up with a sufficiently robust design to counter these adversarial attacks.