# Evasion attacks against Machine Learning models

As seen in class, machine learning models can be fooled by *adversarial examples*, samples artificially crafted to redirect the output of the victim towards a desired result.
To recap, adversarial examples are computed by solving an optimization problem:
$$
  \min_\boldsymbol{\delta} L(\boldsymbol{x} + \boldsymbol{\delta}, y; \boldsymbol{\theta})
  \\
  s.t.\quad ||\delta||_p \le \epsilon
  \\
  \text{subject to} \quad \boldsymbol{l}_b \preccurlyeq \boldsymbol{x} + \boldsymbol{\delta} \preccurlyeq \boldsymbol{l}_u
$$

where $L$ is a loss function of choice, $\boldsymbol{x}$ is the sample to misclassify with original label $y$, $\boldsymbol{\theta}$ are the parameters of the model, $\epsilon$ is the maximum allowed perturbation, and $\boldsymbol{l}_b,\boldsymbol{l}_u$ are the input-space bounds that must be enforced on samples (for instance, images must be clipped in 0-1 or 0-255 to not produce a corruption).

We use a minimization because we want to decrease the score of the classifier, hence causing a generic misclassification.

Hence, to implement an attack, not only there is the need of an *optimization algorithm*, but also all these key components.
In this exercise, we will leverage the *projected gradient descent*[1,2] optimizer, by implementing it step by step in SecML.
First, we create a simple 2D model that we will use in this tutorial, and we fit an SVM classifier on top of it.

[1] Biggio et al. "Evasion attacks against machine learning at test time", ECML PKDD 2013, https://arxiv.org/abs/1708.06131
[2] Madry et al. "Towards deep learning models resistant to adversarial attacks", ICLR 2018, https://arxiv.org/pdf/1706.06083.pdf

In [None]:
import sklearn
import matplotlib.pyplot as plt

X, y = sklearn.datasets.make_blobs(n_samples=1000, n_features=2, centers=[[-1, -1], [1, 1]], cluster_std=0.5,
                                   random_state=0)
X = sklearn.preprocessing.MinMaxScaler().fit_transform(X)
plt.scatter(X[y == 0, 0], X[y == 0, 1], c='r')
plt.scatter(X[y == 1, 0], X[y == 1, 1], c='b')

In [None]:
from secml.ml import CClassifierSVM
from secml.array import CArray

clf = CClassifierSVM()
clf.fit(CArray(X), CArray(y))
clf.fit(CArray(X), CArray(y))

# Projected Gradient Descent (PGD)

The attack is formulated as follows:

TODO insert here algorithm for PGD

First, the attack is initialized by chosing a starting point for the descent, by also specifying the maximum perturbation budget $\epsilon$, the step-size $\alpha$, and the number of iterations.
At each iteration, the strategy computes the gradient of the model, and it updates the adversarial example by following the computed direction.
Lastly, if the applied perturbation is more than the intended perturbation budget $\epsilon$, the algorithm projects this sample back inside a valid $L_p$-ball centered on the starting point, with radius $\epsilon$. 

A graphical explanation of the projected gradient descent is reported below.

TODO insert here 11-step plot

In [None]:
from secml.ml.classifiers.loss import CLossClassification


def pgd_l2_untargeted(x: CArray, y: CArray, loss_fun: CLossClassification, model: CClassifierSVM, eps: float,
                      alpha: float,
                      iterations: int):
    x_adv = x.deepcopy()
    path = CArray.zeros((iterations, x.shape[1]))
    for i in range(iterations):
        logits = clf.decision_function(x_adv)
        loss_grad = loss_fun.dloss(y, logits, pos_label=0)
        svm_grad = model.w
        gradient = svm_grad * loss_grad
        gradient /= gradient.norm()
        x_adv = x_adv - alpha * gradient
        if (x_adv - x).norm() > eps:
            difference = x_adv - x
            difference = difference / difference.norm() * eps
            x_adv = x + difference
        x_adv = x_adv.clip(0, 1)
        path[i,:] = x_adv
    return x_adv, model.predict(x_adv), path

In [None]:
from secml.ml.classifiers.loss import CLossCrossEntropy

x = CArray(X[0, :]).atleast_2d()
y_true = CArray([1, 0])
iterations = 10
eps = 1
alpha = 0.05
loss_func = CLossCrossEntropy()

print(f"Starting point has label: {y[0]}")
x_adv, y_adv, attack_path = pgd_l2_untargeted(x, y_true, loss_func, clf, eps, alpha, iterations)
print(f"Adversarial point has label: {y_adv.item()}")

In [None]:
plt.scatter(X[y == 0, 0], X[y == 0, 1], c='r')
plt.scatter(X[y == 1, 0], X[y == 1, 1], c='b')
plt.scatter(attack_path.tondarray()[:,0], attack_path.tondarray()[:,1], c='g')

In [None]:
from secml.ml import CClassifier
from secml.ml.classifiers.loss import CLossClassification

def pgd_l2_untargeted(x: CArray, y :CArray, loss_fun : CLossClassification, model : CClassifier, eps : float, alpha: float, iterations: int):
    x_adv = x.deepcopy()
    path = []
    for i in range(iterations):
        logits = clf.decision_function(x_adv)
        gradient = loss_fun.dloss(y, logits, clf)
        gradient /= gradient.norm()
        x_adv = x_adv - alpha * gradient
        if (x_adv - x).norm() > eps:
            difference = x_adv - x
            difference = difference / difference.norm() * eps
            x_adv = x + difference
        x_adv = x_adv.clip(0,1)
        path.append(x_adv)
    return x_adv, model.predict(x_adv), path

In [47]:
from secml.ml.classifiers.loss import CLossCrossEntropy

x = CArray(X[0,:]).atleast_2d()
y_true = CArray([1,0])
iterations = 10
eps = 0.1
alpha = 0.05
loss_func = CLossCrossEntropy()

print(f"Starting point has label: {y_true}")
x_adv, y_adv = pgd_l2_untargeted(x, y_true, loss_func, clf, eps, alpha, iterations)
print(f"Adversarial point has label: {y_adv}")

# TODO plot adv path on boundary

Starting point has label: CArray([1 0])


IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices