Could an explainable model be inherently less secure?

Machine learning digital transformation, creative abstract shape — Machine Learning Is Not Proof Against Hacking

In a week when privacy is very much on the agenda, we ask how can we protect AI models. This is already a mature field but still not on the radar of most developers. Recently, ENISA published a report called securing machine learning applications (link below), which gave a good summary of the key threats involved

We first explain the threats and then list the vulnerabilities mapped to the threats

Evasion

A type of attack in which the attacker works on the ML algorithm’s inputs to find small perturbations leading to large modification of its outputs (e.g. decision errors). It is as if the attacker created an optical illusion for the algorithm. Such modified inputs are often called adversarial examples.

Oracle

A type of attack in which the attacker explores a model by providing a series of carefully crafted inputs and observing outputs. These attacks can be previous steps to more harmful types, evasion or poisoning for example. Example: an attacker studies the set of input-output pairs and uses the results to retrieve training data.

Poisoning

A type of attack in which the attacker altered data or model to modify the ML algorithm’s behavior in a chosen direction (e.g. to sabotage its results, to insert a backdoor). It is as if the attacker conditioned the algorithm according to its motivations. Example: massively indicating to an image recognition algorithm that images of dogs are indeed cats to lead it to interpret it this way.

Label modification

An attack in which the attacker corrupts the labels of training data.

Model or data disclosure

This threat refers to the possibility of leakage of all or partial information about the model. Example: the outputs of a ML algorithm are so verbose that they give information about its configuration (or leakage of sensitive data)

Data disclosure

This threat refers to a leak of data manipulated by ML algorithms. This data leakage can be explained by an inadequate access control, a handling error of the project team or simply because sometimes the entity that owns the model and the entity that owns the data are distinct.

Model disclosure

This threat refers to a leak of the internals (i.e. parameter values) of the ML model. This model leakage could occur because of human error or contraction with a third party with a too low-security level.

Compromise of ML application components

This threat refers to the compromise of a component or developing tool of the ML application.

Example: compromise of one of the open-source libraries used by the developers to implement the ML algorithm

Failure or malfunction of ML application

This threat refers to ML application failure (e.g. denial of service due to bad input, unavailability due to a handling error). For example, the service level of the support infrastructure of the ML application hosted by a third party is too low compared to the business needs, the application is regularly unavailable.

Human error

The different stakeholders of the model can make mistakes that result in a failure or malfunction of ML application. For example, due to lack of documentation, they may use the application in use-cases not initially foreseen.

Having understood an idea of the threats, lets now look at the vulnerabilities that map to the above threats

Could an explainable model be inherently less secure?