## Adversarial Examples

Adversarial examples, also known as evasion attacks, are intentionally-perturbed input samples aimed to mislead classification at test time. [1,2].

These attacks are formulated as optimization problems that can be solved via gradient-based optimizers.

Here, we will compute adversarial examples by minimizing a loss function $L$ on a target label $y_t$ (different from the true class), under manipulation constraints, as given below:

$$
\begin{eqnarray}
    \mathbf x^\star \in {\arg\min}_{\mathbf x} && L(\mathbf x, y_t, \theta) \, \\
    {\rm s.t. } && \| \mathbf x- \mathbf x_0\|_2 \leq \varepsilon \, , \\
    && \mathbf x_{\rm lb} \preceq \mathbf x \preceq \mathbf x_{\rm ub} \, .
\end{eqnarray}
$$

The first constraint imposes that the adversarial perturbation will not be larger than $\varepsilon$ (measured in $\ell_2$ norm).
The second constraint is a box constraint to enforce the adversarial image not to exceed the range 0-255 (or 0-1, if the input pixels are scaled).


We solve this problem with a *projected* gradient-descent algorithm below, which iteratively projects the adversarial image on the feasible domain to ensure that the constraints remain valid.

The attack is meant to manipulate the input pixels of the initial image. To this end, we will need to explicitly account for the transform/scaling performed before passing the input sample to the neural network. In particular, at each iteration, we will map the image from the pixel space onto the transformed/scaled space, update the attack point along the gradient direction in that space, project the modified image back onto the input pixel space (using an inverse-transformation function), and apply box and $\ell_2$ constraints in the input space.


**References**
1.   C. Szegedy et al.,  Intriguing Properties of Neural Networks, ICLR 2014, https://arxiv.org/abs/1312.6199
2.   B. Biggio et al., Evasion Attacks against Machine Learning at Test Time, ECML PKDD 2013, https://arxiv.org/abs/1708.06131

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zangobot/adversarial_challenge/blob/main/chall1.ipynb)

In [None]:
!npx degit https://github.com/zangobot/adversarial_challenge --force
!pip install -r requirements.txt

In [None]:
import numpy as np
import torch
from torchvision import datasets

from mnist_model import SimpleNet

net = SimpleNet().load_pretrained_mnist('mnist_net.pth')
mnist = datasets.MNIST(root='.', download=True, train=False, transform=net.get_transform())
sample, label = mnist[350]
sample = sample.view((1, *sample.shape))
target_label = torch.LongTensor([2])

print(f'Original label: {label}')
iterations = 2000
eps = 5
loss = torch.nn.CrossEntropyLoss()
step_size = 1

x_adv = sample.clone()
x_adv = x_adv.requires_grad_()

for i in range(iterations):
	scores = net(x_adv)

	output = loss(scores, target_label)

	output.backward()
	gradient = x_adv.grad
	gradient = gradient / torch.norm(gradient, p=2)
	x_adv.data = x_adv.data - step_size * gradient
	x_adv.data = torch.clamp(x_adv, 0, 1)
	if torch.norm(x_adv - sample, p=2) > eps:
		delta = x_adv.data - sample.data
		delta = delta / torch.norm(delta, p=2)
		x_adv.data = sample.data + delta.data
	x_adv.grad.data.zero_()

print(f'Adv loss: {output}')
print(f'Adv label: {scores.argmax(dim=-1)}')