# Basic Adversarial Examples

## Copyright notice

This version (c) 2021 Fabian Offert, [MIT License](LICENSE).

## Colab Setup

Run the below commands only if you imported this notebook into Google Colab! Also **go to Runtime/Change runtime type and pick "GPU" as the hardware accelerator!**

In [None]:
!rm -rf adversarial # In case this is re-run
!git clone https://github.com/zentralwerkstatt/adversarial
!cp ./adversarial/erika_299x299.jpg ./
!cp ./adversarial/giant_panda_299x299.jpg ./
!cp ./adversarial/*synset_words.txt ./

In [None]:
!nvidia-smi # Check what kind of GPU we got

## Imports

We are using PyTorch as our deep learning framework.

In [None]:
import torch as t
import torch.nn as nn
import torch.nn.functional as F
import torchvision as tv

import numpy as np

from scipy.ndimage.filters import gaussian_filter, median_filter
from skimage.restoration import denoise_bilateral, denoise_tv_chambolle
import PIL.Image, PIL.ImageChops

import os
import random
from io import BytesIO
from IPython import display

## Model to attack

We are using the (very common) InceptionV3 architecture, pre-trained on ImageNet for most tasks, but also load VGG16 and VGG19 to test the "universality" of adversarial attacks (optional). **Colab users note: this may take a while, as the pre-trained weights have to be loaded in the background!**

In [None]:
device = t.device("cuda:0" if t.cuda.is_available() else "cpu") # Use GPU if available
f1 = tv.models.inception_v3(pretrained=True).to(device)
f2 = tv.models.vgg16(pretrained=True).to(device)
f3 = tv.models.vgg19(pretrained=True).to(device)
# Test mode: we do not want to train the model (i.e. change its weights) at any point
f1.eval()
f2.eval()
f3.eval()
model_names = {'f1':'Inception V3', 'f2':'VGG16', 'f3': 'VGG19'}

## Helper functions

Among other things, these helper functions allow us to convert between PyTorch tensors, NumPy arrays, and PIL images.

In [None]:
# Show an image within a Jupyter environment
# Can do PyTorch tensors, NumPy arrays, and PIL images
def show_img(img, title='', fmt='jpeg'):
    if type(img) is np.ndarray:
        img = PIL.Image.fromarray(img)
    elif type(img) is t.Tensor:
        img = deprocess(img)
    out = BytesIO()
    if title: print(title)
    img.save(out, fmt)
    display.display(display.Image(data=out.getvalue()))

# PyTorch is channels first, this happens here!
preprocess = tv.transforms.Compose([tv.transforms.ToTensor()])
    
# Reverse of preprocess, PyTorch tensor to PIL image
def deprocess(tensor):
    # Clone tensor first, otherwise we are NOT making a copy by using .cpu()!
    img = t.clone(tensor)
    img = img.cpu().data.numpy().squeeze() # Get rid of batch dimension
    img = img.transpose((1, 2, 0)) # Channels first to channels last
    
    # We are not using ImageNet images as input
    # mean = np.array([0.485, 0.456, 0.406]) 
    # std = np.array([0.229, 0.224, 0.225]) 
    # img = std * img + mean

    # No clipping, adversarial regulation should take care of this
    # img = np.clip(img, 0, 1)
    
    # 0./1. range to 0./255. range
    img *= 255
    
    img = img.astype(np.uint8)
    img = PIL.Image.fromarray(img)
    return img

# Return a gray square PIL image
def gray_square(size):
    # Gray square, -1./1. range
    img = np.random.normal(0, 0.01, (size, size, 3)) 
    
    # -1./1. range to 0./255. range
    img /= 2.
    img += 0.5
    img *= 255.

    img = img.astype(np.uint8)
    img = PIL.Image.fromarray(img)
    return img

# Load ImageNet classes
with open('synset_words.txt') as synset_words_file:
    synset_words = synset_words_file.readlines()
for i, line in enumerate(synset_words):
    synset_words[i] = line.replace(' ', '_').replace(',', '_').lower().strip()

# Classify an image with the target model 
# Can do PyTorch tensors and PIL images
def predict(img, model):
    if type(img) is t.Tensor:
        preds = model(img.to(device))
    else:
        preds = model(preprocess(img).unsqueeze(0).to(device))
    preds_softmax_np = F.softmax(preds, dim=1).cpu().data.numpy()
    # Returns class no., class name, and prediction confidence
    return preds_softmax_np.argmax(), synset_words[preds_softmax_np.argmax()], preds_softmax_np.max()

# "Rolling" list: whenever an item is added, the first item is discarded
def destructive_append(l,i):
    l=l[1:]
    l.append(i)
    return l

# PyTorch and skimage use different channel ordering
def pytorch_to_skimage(img):
    # No batch dimension
    img = img[0]
    # Channels last
    img = np.swapaxes(img, 0, 2)
    return img
    
def skimage_to_pytorch(img):
    # Channels first
    img = np.swapaxes(img, 0, 2)
    # Skimage uses double
    img = img.astype(np.float32)
    # No Batch dimension
    img = np.expand_dims(img, 0)
    return img

# Filters for feature visualization
def filter_median(npimg, params):
    npimg = median_filter(npimg, size=(1, 1, params['fsize'], params['fsize']))  
    return npimg

def filter_bilateral(npimg, params):
    npimg = pytorch_to_skimage(npimg)
    npimg = denoise_bilateral(npimg, sigma_color=0.05, sigma_spatial=15, multichannel=True)
    npimg = skimage_to_pytorch(npimg)
    return npimg

def filter_TV(npimg, params):
    npimg = pytorch_to_skimage(npimg)
    npimg = denoise_tv_chambolle(npimg, weight=0.1, multichannel=True)
    npimg = skimage_to_pytorch(npimg)
    return npimg

## Attacks

### Fast Gradient Sign Method (FGSM)

From: Goodfellow, I.J., Shlens, J., Szegedy, C., 2014. [Explaining and harnessing adversarial examples](https://arxiv.org/abs/1412.6572). arXiv preprint arXiv:1412.6572.

In [None]:
def fgsm(img, neuron, model):
    
    η = 0.007 # Pertuberation amount
    
    # Preprocess input image and put on GPU
    input = preprocess(img).unsqueeze(0).to(device).requires_grad_()
    
    # Reset gradients
    model.zero_grad()
    
    # Forward pass
    x = model(input)
    
    # Use true label as optimum
    loss = nn.CrossEntropyLoss()
    # nn.CrossEntropyLoss() counter-intuitively does NOT take a one-hot vector as target!
    label = t.tensor([neuron], dtype=t.long).to(device)
    cost = loss(x, label)
    cost.backward()
    
    attack_img = input + η*input.grad.sign()
    attack_img = t.clamp(attack_img, 0.0, 1.0)
    
    return attack_img

### Expectation over Transformation Method (ETO) / Adversarial Patch

From: Brown, T.B., Mané, D., Roy, A., Abadi, M., Gilmer, J., 2017. [Adversarial patch](https://arxiv.org/abs/1712.09665). arXiv preprint arXiv:1712.09665, Athalye, A., Engstrom, L., Ilyas, A. and Kwok, K., 2017. [Synthesizing robust adversarial examples](https://arxiv.org/abs/1707.07397). arXiv preprint arXiv:1707.07397.

Note: Only partially implemented (patch location) at this point.

In [None]:
def eto(img, neuron, model):
        
    LR = 0.4 # Yosinski learning rate
    MIN_CONFIDENCE = 0.99 # Minimum prediction confidence to stop optimization
    L2 = 1e-4 # Yosinski weight decay
    
    bg = PIL.Image.open('erika_299x299.jpg') # Fixed background for now
    assert bg.width == bg.height # Assert background is square
    assert img.width == img.height # Assert image to be optimized is square
    assert img.width < bg.width # Assert image to be optimized is smaller than background

    input = preprocess(img).unsqueeze(0).to(device).requires_grad_()
    npbg = preprocess(bg).unsqueeze(0).data.cpu().numpy() # To tensor and back so we don not have to deal with channels etc.
    optimizer = t.optim.SGD([input], lr=LR, weight_decay=L2)
    
    max_shift = npbg.shape[2]-input.shape[2]
    
    # We want to keep a running average, as the patch location is constantly changing
    mem_confidence = 100
    acc_confidence = [0.0 for i in range(mem_confidence)]
    avg_confidence = 0.0
    confidence = 0.0
    
    i = 0  
    while avg_confidence < MIN_CONFIDENCE:
        
        optimizer.zero_grad()
        
        # TO DO: load random background image
        # TO DO: scaling and rotation
        npimg = input.data.cpu().numpy()
        x_shift = np.random.randint(max_shift)
        y_shift = np.random.randint(max_shift)
        npcombined = npbg.copy()
        npcombined[:,:,y_shift:y_shift+img.height,x_shift:x_shift+img.width] = npimg
        input.data = t.from_numpy(npcombined).to(device)
        
        x = model(input)
        loss = -x[:,neuron] # -x as the optimizer wants to minimize loss and we want to maximize class probability
    
        preds_softmax_np = F.softmax(x, dim=1).cpu().data.numpy()
        confidence = preds_softmax_np[:,neuron]
        
        acc_confidence = destructive_append(acc_confidence, confidence)
        avg_confidence = sum(acc_confidence)/mem_confidence
            
        i+=1
        
        if i%50 == 0: 
            print(f'Iterations: {i}, loss: {loss.item()}, pred.: {synset_words[preds_softmax_np.argmax()]}, avg. conf.: {avg_confidence}')
            
        loss.backward()
        optimizer.step()
        
        npcombined = input.data.cpu().numpy()
        npimg = npcombined[:,:,y_shift:y_shift+img.height,x_shift:x_shift+img.width]
        input.data = t.from_numpy(npimg).to(device)
    
    return input

### Iterative Least-Likely Class Method (ILLC)

From: Kurakin, A., Goodfellow, I. and Bengio, S., 2016. [Adversarial examples in the physical world](https://arxiv.org/abs/1607.02533). arXiv preprint arXiv:1607.02533.

In [None]:
def illc(img, neuron, model):
    
    LR = 0.01 # Learning rate
    η = 0.01 # Max pertuberation amount
    MIN_CONFIDENCE = 0.99 # Minimum prediction confidence to stop optimization

    input = preprocess(img).unsqueeze(0).to(device).requires_grad_()
    original = input.data.cpu().numpy()
    optimizer = t.optim.SGD([input], lr=LR)
    
    i = 0
    confidence = 0.0
    
    while confidence < MIN_CONFIDENCE:
        
        optimizer.zero_grad()
        
        x = model(input)
        loss = -x[:,neuron] # -x as the optimizer wants to minimize loss and we want to maximize class probability
        
        preds_softmax_np = F.softmax(x, dim=1).cpu().data.numpy()
        confidence = preds_softmax_np[:,neuron]

        i+=1

        if i%50 == 0: 
            print(f'Iterations: {i}, loss: {loss.item()}, pred.: {synset_words[preds_softmax_np.argmax()]}, conf.: {confidence}')

        loss.backward()
        optimizer.step()
        
        # Regular and adversarial clipping on the CPU (don't mess with GPU tensors in place!)
        img = input.data.cpu().numpy()
        
        clipped = np.where(img > original + η, original + η, img)
        clipped = np.where(clipped < original - η, original - η, clipped)
        # We could also use t.clamp() but as we are manipulating CPU representations anyway...
        clipped = np.where(clipped > 1.0, 1.0, clipped)
        clipped = np.where(clipped < 0.0, 0.0, clipped)
        
        input.data = t.from_numpy(clipped).to(device)
    
    return input

### Generalized Gradient Ascent with Perceptual Filtering

In [None]:
def gradient_ascent(img, neuron, model):

    ITERATIONS = 2000
    # FILTERS = [{'function':filter_median, 'frequency':4, 'params':{'fsize':5}}] # Good parameters
    FILTERS = [{'function':filter_TV, 'frequency':20, 'params':{}}] # Good parameters
    JITTER = 32
    LR = 0.4
    L2 = 1e-4 # Yosinski weight decay
            
    input = preprocess(img).unsqueeze(0).to(device).requires_grad_()
    optimizer = t.optim.SGD([input], lr=LR, weight_decay=L2)
    
    for i in range(ITERATIONS):
        
        optimizer.zero_grad()
        
        # Centers the object in the image
        if JITTER:
            npimg = input.data.cpu().numpy() # To CPU and numpy
            ox, oy = np.random.randint(-JITTER, JITTER+1, 2)
            npimg = np.roll(np.roll(npimg, ox, -1), oy, -2) # Jitter
            input.data = t.from_numpy(npimg).to(device)

        x = model(input)
        loss = -x[:,neuron]

        preds_softmax_np = F.softmax(x, dim=1).cpu().data.numpy()
        confidence = preds_softmax_np[:,neuron]
                    
        if i%50 == 0: 
            print(f'Iterations: {i}, loss: {loss.item()}, pred.: {synset_words[preds_softmax_np.argmax()]}, conf.: {confidence}')

        loss.backward()
        optimizer.step()
        
        # Centers the object in the image
        if JITTER:
            npimg = input.data.cpu().numpy() # To CPU and numpy
            npimg = np.roll(np.roll(npimg, -ox, -1), -oy, -2) # Jitter
            input.data = t.from_numpy(npimg).to(device)
            
        # Stochastic clipping
        input.data[input.data > 1] = np.random.uniform(0, 1)
        input.data[input.data < 0] = np.random.uniform(0, 1)
        
        # Filtering
        for filter_ in FILTERS:
            if i != ITERATIONS - 1: # No regularization on last iteration for good quality output
                if i % filter_['frequency'] == 0:
                    npimg = input.data.cpu().numpy() # To CPU and numpy
                    npimg = filter_['function'](npimg, filter_['params'])
                    input.data = t.from_numpy(npimg).to(device)
        # Verbose
        if i%50==0:
          show_img((input))

    return input

## Demos

### Load sample images

In [None]:
erika = PIL.Image.open('erika_299x299.jpg')
giant_panda = PIL.Image.open('giant_panda_299x299.jpg')
show_img(erika)
show_img(giant_panda)

### FGSM

In [None]:
# Regular image and predictions
show_img(giant_panda)
print(model_names['f1'], predict(giant_panda, f1))

# Attack
img = fgsm(giant_panda, giant_panda_id, f1)

# Adversarial image and predictions
show_img(img)
print(model_names['f1'], predict(img, f1))

### ILLC

In [None]:
# Regular image
show_img(erika)

# Attack
img = illc(erika, 1, f1) # 1 = Goldfish

# Adversarial image and prediction
show_img(img)
print(model_names['f1'], predict(img, f1))

In [None]:
# Amplify adversarial pattern
diff = PIL.ImageChops.difference(deprocess(img), erika)
show_img(diff)
show_img(np.array(diff)*50)

### General gradient ascent for feature visualization

In [None]:
noise = gray_square(299)
img = gradient_ascent(noise, 1, f1)

### ETO

In [None]:
img = eto(gray_square(50), 1, f1) # 1 = Goldfish
show_img(img)

In [None]:
bg = PIL.Image.open('erika_299x299.jpg')
npbg = preprocess(bg).unsqueeze(0).data.cpu().numpy()
npimg = img.data.cpu().numpy()
max_shift = npbg.shape[2]-img.shape[2]

for i in range(1):
    x_shift = np.random.randint(max_shift)
    y_shift = np.random.randint(max_shift)
    npcombined = npbg.copy()
    npcombined[:,:,y_shift:y_shift+img.shape[2],x_shift:x_shift+img.shape[2]] = npimg
    show_img(t.from_numpy(npcombined))
    print(model_names['f1'], predict(t.from_numpy(npcombined), f1))
    # print(model_names['f2'], predict(t.from_numpy(npcombined), f2))
    # print(model_names['f3'], predict(t.from_numpy(npcombined), f3))

## References

Athalye, A., Engstrom, L., Ilyas, A., & Kwok, K. (2017). [Synthesizing robust adversarial examples](https://arxiv.org/abs/1707.07397). arXiv preprint arXiv:1707.07397.

Brown, T.B., Mané, D., Roy, A., Abadi, M., & Gilmer, J. (2017). [Adversarial patch](https://arxiv.org/abs/1712.09665). arXiv preprint arXiv:1712.09665

Carlini, N. & Wagner, D. (2017). [Adversarial examples are not easily detected: Bypassing ten detection methods](https://dl.acm.org/doi/pdf/10.1145/3128572.3140444). Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

Goodfellow, I.J., Shlens, J., & Szegedy, C. (2014). [Explaining and harnessing adversarial examples](https://arxiv.org/abs/1412.6572). arXiv preprint arXiv:1412.6572.

Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., & Madry, A. (2019). [Adversarial examples are not bugs, they are features](https://arxiv.org/pdf/1905.02175.pdf). arXiv preprint arXiv:1905.02175.

Kurakin, A., Goodfellow, I., & Bengio, S. (2016). [Adversarial examples in the physical world](https://arxiv.org/abs/1607.02533). arXiv preprint arXiv:1607.02533.

Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). [Universal adversarial perturbations](https://openaccess.thecvf.com/content_cvpr_2017/papers/Moosavi-Dezfooli_Universal_Adversarial_Perturbations_CVPR_2017_paper.pdf). In Proceedings of the CVPR (pp. 1765-1773).

Nguyen, A., Yosinski, J., Clune, J. (2015). [Deep neural networks are easily fooled: High confidence predictions for unrecognizable images](https://www.cv-foundation.org/openaccess/content_cvpr_2015/html/Nguyen_Deep_Neural_Networks_2015_CVPR_paper.html). In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 427–436.

Ruiz, N., Bargal, S. A., & Sclaroff, S. (2020). [Disrupting deepfakes: Adversarial attacks against conditional image translation networks and facial manipulation systems](https://arxiv.org/pdf/2003.01279.pdf). arXiv preprint arXiv:2003.01279.

Salman, H., Ilyas, A., Engstrom, L., Vemprala, S., Madry, A., & Kapoor, A. (2020). [Unadversarial Examples: Designing Objects for Robust Vision](https://arxiv.org/pdf/2012.12235.pdf). arXiv preprint arXiv:2012.12235.

Su, J., Vargas, D.V., & Sakurai, K. (2019). [One pixel attack for fooling deep neural networks](https://ieeexplore.ieee.org/abstract/document/8601309). IEEE Transactions on Evolutionary Computation, 23(5), pp.828-841.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. & Fergus, R. (2013). [Intriguing properties of neural networks](https://arxiv.org/abs/1312.6199). arXiv preprint arXiv:1312.6199.