# Evasion attacks against Machine Learning models

As seen in class, machine learning models can be fooled by *adversarial examples*, samples artificially crafted to redirect the output of the victim towards a desired result.
This attacks can be either *targeted*, as the attacker wants to produce a specific misclassification (e.g. a dog must be recognized as a cat), or *untargeted*, as the attacker is satisfied with producing a generic misclassication (e.g. a dog will be recognized as anything else but not a dog).

Both targeted and untargeted attacks are computed as an optimization problem.
Targeted attacks that can be written as:
$$
  \min_\boldsymbol{\delta} L(\boldsymbol{x} + \boldsymbol{\delta}, y_t; \boldsymbol{\theta})
  \\
  s.t.\quad ||\delta||_p \le \epsilon
  \\
  \text{subject to} \quad \boldsymbol{l}_b \preccurlyeq \boldsymbol{x} + \boldsymbol{\delta} \preccurlyeq \boldsymbol{l}_u
$$

where $L$ is a loss function of choice, $\boldsymbol{x}$ is the sample to misclassify with label $y_t$, $\boldsymbol{\theta}$ are the parameters of the model, $\epsilon$ is the maximum allowed perturbation, and $\boldsymbol{l}_b,\boldsymbol{l}_u$ are the input-space bounds that must be enforced on samples (for instance, images must be clipped in 0-1 or 0-255 to not produce a corruption).

Untargeted attacks can be written as:

$$
  \max_\boldsymbol{\delta} L(\boldsymbol{x} + \boldsymbol{\delta}, y; \boldsymbol{\theta})
  \\
  s.t.\quad ||\delta||_p \le \epsilon
  \\
  \text{subject to} \quad \boldsymbol{l}_b \preccurlyeq \boldsymbol{x} + \boldsymbol{\delta} \preccurlyeq \boldsymbol{l}_u
$$

where we change the minimization to a *maximisation*, since we want to maximise the error of the classifier w.r.t. the real label $y$.

We start implementing *untargeted* evasion attacks, and we need to define two main components: the *optimization algorithm* and the *loss function* of the attack. While the second one can be *any* distance function, we will now describe one particular optimizer.
In this exercise, we will leverage the *projected gradient descent* [1,2] optimizer, by implementing it step by step in SecML.
First, we create a simple 2D model that we will use in this tutorial, and we fit an SVM classifier on top of it.

[1] Biggio et al. "Evasion attacks against machine learning at test time", ECML PKDD 2013, https://arxiv.org/abs/1708.06131
[2] Madry et al. "Towards deep learning models resistant to adversarial attacks", ICLR 2018, https://arxiv.org/pdf/1706.06083.pdf

In [None]:
import sklearn
from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=1000, n_features=2, centers=[[-1, -1], [1, 1]], cluster_std=0.5,
                  random_state=0)
X = sklearn.preprocessing.MinMaxScaler().fit_transform(X)

In [None]:
from secml.ml import CClassifierSVM
from secml.array import CArray

clf = CClassifierSVM()
clf.fit(CArray(X), CArray(y))

from secml.figure import CFigure

fig = CFigure()
fig.sp.scatter(X[y == 0, 0], X[y == 0, 1], c='r')
fig.sp.scatter(X[y == 1, 0], X[y == 1, 1], c='b')
fig.sp.plot_decision_regions(clf, plot_background=False,
                             n_grid_points=200)  # helper function for plotting the decision function

# Projected Gradient Descent (PGD)

The attack is formulated as follows:

TODO insert here algorithm for PGD

First, the attack is initialized by chosing a starting point for the descent, by also specifying the maximum perturbation budget $\epsilon$, the step-size $\alpha$, and the number of iterations.
At each iteration, the strategy computes the gradient of the model, and it updates the adversarial example by following the computed direction.
Lastly, if the applied perturbation is more than the intended perturbation budget $\epsilon$, the algorithm projects this sample back inside a valid $L_p$-ball centered on the starting point, with radius $\epsilon$. 

A graphical explanation of the projected gradient descent is reported below.

TODO insert here 11-step plot

In [None]:
from secml.ml.classifiers.loss import CLossCrossEntropy

def pgd_l2_untargeted(x: CArray, y: CArray, model: CClassifierSVM, eps: float,
                      alpha: float,
                      iterations: int):
    loss_func = CLossCrossEntropy()
    x_adv = x.deepcopy()
    y_true = CArray([0, 0])
    y_true[y] = 1
    path = CArray.zeros((iterations + 1, x.shape[1]))
    path[0, :] = x_adv
    k = 5
    for i in range(iterations):
        logits = clf.decision_function(x_adv)

        loss = loss_func.dloss(y_true, logits, pos_label=y) # BEWARE of the decision function of the SVM!
        svm_grad = model.gradient(x_adv, logits)
        gradient = svm_grad * loss
        gradient /= gradient.norm()

        x_adv = x_adv + alpha * gradient
        if (x_adv - x).norm() > eps:
            difference = x_adv - x
            difference = difference / difference.norm() * eps
            x_adv = x + difference
        x_adv = x_adv.clip(0, 1)
        path[i + 1, :] = x_adv
    return x_adv, model.predict(x_adv), path

In [None]:
index = 0
x = CArray(X[index, :]).atleast_2d()
y_true = y[index]
iterations = 10
eps = 0.5
alpha = 0.05

print(f"Starting point has label: {y_true}")
x_adv, y_adv, attack_path = pgd_l2_untargeted(x, y_true, clf, eps, alpha, iterations)
print(f"Adversarial point has label: {y_adv.item()}")

In [None]:
from secml.figure import CFigure
from secml.optim.constraints import CConstraintL2

fig = CFigure()
fig.sp.scatter(X[y == 0, 0], X[y == 0, 1], c='r')
fig.sp.scatter(X[y == 1, 0], X[y == 1, 1], c='b')
fig.sp.plot_decision_regions(clf, plot_background=False, n_grid_points=200)
constraint = CConstraintL2(center=x, radius=eps)
fig.sp.plot_path(attack_path)
fig.sp.plot_constraint(constraint)

Evasion achieved!
As you could see, the process is not bug-free and it is complex to handle (like, what happens if I chose another sample for the attack? Why is not working?)
Hence, SecML already provides a lot of attack wrappers to accomplish the same task effortlessly.

In [None]:
solver_params = {
    'eta': alpha,
    'max_iter': iterations,
    'eps': 1
}

from secml.adv.attacks.evasion import CAttackEvasionPGD
pgd_attack = CAttackEvasionPGD(
    classifier=clf,
    double_init=False,
    distance='l2',
    dmax=eps,
    lb=0, ub=1,
    solver_params=solver_params,
    y_target=None)

# Run the evasion attack on x0
y_pred_pgd, _, adv_ds_pgd, _ = pgd_attack.run(x, y_true)

print("Starting point has label: ", y_true.item())
print(f"Adversarial point has label: {y_pred_pgd.item()}")

In [None]:
from secml.figure import CFigure
from secml.optim.constraints import CConstraintL2

fig = CFigure()
fig.sp.scatter(X[y == 0, 0], X[y == 0, 1], c='r')
fig.sp.scatter(X[y == 1, 0], X[y == 1, 1], c='b')
fig.sp.plot_decision_regions(clf, plot_background=False, n_grid_points=200)
constraint = CConstraintL2(center=x, radius=eps)
fig.sp.plot_path(pgd_attack.x_seq)
fig.sp.plot_constraint(constraint)

# Exercise

Now, we will create new data and a new classifier, and we will perform both attacks through SecML.
Your task consists of using the library to create adversarial examples, in both targeted and untargeted settings.

In [None]:
from secml.ml.features import CNormalizerMinMax
from secml.ml.classifiers.multiclass import CClassifierMulticlassOVA
from secml.ml.kernels import CKernelRBF

random_state = 999

n_features = 2  # Number of features
n_samples = 1000  # Number of samples
centers = [[-2, 0], [2, -2], [2, 2]]  # Centers of the clusters
cluster_std = 0.4  # Standard deviation of the clusters

from secml.data.loader import CDLRandomBlobs
dataset = CDLRandomBlobs(n_features=n_features, 
                         centers=centers, 
                         cluster_std=cluster_std,
                         n_samples=n_samples,
                         random_state=random_state).load()


# Normalize the data
dataset.X = CNormalizerMinMax().fit_transform(dataset.X)
clf = CClassifierMulticlassOVA(CClassifierSVM, C=0.1, kernel=CKernelRBF(gamma=10))
clf.fit(dataset.X, dataset.Y)

fig = CFigure()
fig.sp.scatter(dataset.X[dataset.Y==0,0], dataset.X[dataset.Y==0,1], c='r')
fig.sp.scatter(dataset.X[dataset.Y==1,0], dataset.X[dataset.Y==1,1], c='b')
fig.sp.scatter(dataset.X[dataset.Y==2,0], dataset.X[dataset.Y==2,1], c='g')
fig.sp.plot_decision_regions(clf, plot_background=False, n_grid_points=200)

In [None]:
iterations = 100
alpha = 0.01
y_target = 0

index = 10
x, y = dataset.X[index,:], dataset.Y[index]

solver_params = {
    'eta': alpha,
    'max_iter': iterations,
    'eps': 1e-4
}

from secml.adv.attacks.evasion import CAttackEvasionPGD
pgd_attack_t = CAttackEvasionPGD(
    classifier=clf,
    double_init=False,
    distance='l2',
    dmax=eps,
    lb=0, ub=1,
    solver_params=solver_params,
    y_target=y_target)

# Run the evasion attack on x0
y_pred_pgd_t, _, adv_ds_pgd_t, _ = pgd_attack_t.run(x, y)

print("Starting point has label: ", y.item())
print(f"Adversarial point has label: {y_pred_pgd_t.item()}")

fig = CFigure()
fig.sp.scatter(dataset.X[dataset.Y==0,0], dataset.X[dataset.Y==0,1], c='r')
fig.sp.scatter(dataset.X[dataset.Y==1,0], dataset.X[dataset.Y==1,1], c='b')
fig.sp.scatter(dataset.X[dataset.Y==2,0], dataset.X[dataset.Y==2,1], c='g')

fig.sp.plot_decision_regions(clf, plot_background=False, n_grid_points=200)

fig.sp.plot_fun(pgd_attack_t.objective_function, plot_levels=False, 
                    multipoint=True, n_grid_points=20)
constraint = CConstraintL2(center=x, radius=eps)

fig.sp.plot_path(pgd_attack_t.x_seq)
fig.sp.plot_constraint(constraint)

In [None]:
iterations = 100
alpha = 0.01
y_target = None

index = 100
x, y = dataset.X[index,:], dataset.Y[index]
print(x,y)

solver_params = {
    'eta': alpha,
    'max_iter': iterations,
    'eps': 1e-4
}

from secml.adv.attacks.evasion import CAttackEvasionPGD
pgd_attack_u = CAttackEvasionPGD(
    classifier=clf,
    double_init=False,
    distance='l2',
    dmax=eps,
    lb=0, ub=1,
    solver_params=solver_params,
    y_target=y_target)

# Run the evasion attack on x0
y_pred_pgd_u, _, adv_ds_pgd_u, _ = pgd_attack_u.run(x, y)

print("Starting point has label: ", y.item())
print(f"Adversarial point has label: {y_pred_pgd_u.item()}")

fig = CFigure()
fig.sp.scatter(dataset.X[dataset.Y==0,0], dataset.X[dataset.Y==0,1], c='r')
fig.sp.scatter(dataset.X[dataset.Y==1,0], dataset.X[dataset.Y==1,1], c='b')
fig.sp.scatter(dataset.X[dataset.Y==2,0], dataset.X[dataset.Y==2,1], c='g')

fig.sp.plot_decision_regions(clf, plot_background=False, n_grid_points=200)

fig.sp.plot_fun(pgd_attack_u.objective_function, plot_levels=False, multipoint=True, n_grid_points=20)

constraint = CConstraintL2(center=x, radius=eps)
fig.sp.plot_path(pgd_attack_u.x_seq)
fig.sp.plot_constraint(constraint)