
# Closed-Box Evasion Answer Key
This answer key is configured such that you should be able to run the code here and see possible approaches to a working solution. For each topic, it will also link further resources, and go into more detail on certain code chunks. It is not meant to be edited. 

Use these answer keys as a guide as needed. Try to work use the context here to work toward an answer before reaching for the solution.

**If you just want to see the answers, they're all tagged with "SOLUTION", CTRL+F your heart out.**

## Setup
The setup code must be run for the solutions to work properly. Review the breakdown of the setup code in the lab notebook for an explanation of each section.




In [None]:
import torch
from PIL import Image
from IPython import display

import pandas as pd
import torchvision
from torchvision import transforms

import numpy as np
import matplotlib.pyplot as plt

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

#load the model from the pytorch hub
model = torch.hub.load('pytorch/vision:v0.10.0', 'mobilenet_v2', weights='MobileNet_V2_Weights.DEFAULT', verbose=False)

# Put model in evaluation mode
model.eval()

# put the model on a GPU if available, otherwise CPU
model.to(device);

# Define the transforms for preprocessing
preprocess = transforms.Compose([
    transforms.Resize(256),  # Resize the image to 256x256
    transforms.CenterCrop(224),  # Crop the image to 224x224 about the center
    transforms.ToTensor(),  # Convert the image to a PyTorch tensor
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],  # Normalize the image with the ImageNet dataset mean values
        std=[0.229, 0.224, 0.225]  # Normalize the image with the ImageNet dataset standard deviation values
    )
]);

unnormalize = transforms.Normalize(
   mean= [-m/s for m, s in zip([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])],
   std= [1/s for s in [0.229, 0.224, 0.225]]
)

with open("../data/labels.txt", 'r') as f:
    labels = [label.strip() for label in f.readlines()]

img = Image.open("../data/dog.jpg")
img_tensor = preprocess(img).unsqueeze(0)
img_tensor = img_tensor.to(device)

with torch.no_grad():
    output = model(img_tensor)

print(f"Image tensor on device:\n---------------\n{img_tensor.device}\n")
print(f"Inputs information:\n---------------\nshape:{img_tensor.shape}\nclass: {type(img_tensor)}\n")
print(f"Shape of outputs:\n---------------\n{output.shape}\n")
print(f"Pred Index:\n---------------\n{output[0].argmax()}\n")
print(f"Pred Label:\n---------------\n{labels[output[0].argmax()]}\n")

unnormed_img_tensor= unnormalize(img_tensor)

img_pil = transforms.functional.to_pil_image(unnormed_img_tensor[0])
img_pil.show()

## SimBA (Simple BlackBox Attack)
### Resources & Setup
- [SimBA Paper](https://arxiv.org/abs/1905.07121)
- [Attack implementation code](https://github.com/cg563/simple-blackbox-attack)

Start by reloading the image...

In [None]:
img = Image.open("../data/dog.jpg")
img_tensor = preprocess(img).unsqueeze(0)
img_tensor = img_tensor.to(device)

with torch.no_grad():
    output = model(img_tensor)

print(f"Image tensor on device:\n---------------\n{img_tensor.device}\n")
print(f"Inputs information:\n---------------\nshape:{img_tensor.shape}\nclass: {type(img_tensor)}\n")
print(f"Shape of outputs:\n---------------\n{output.shape}\n")
print(f"Pred Index:\n---------------\n{output[0].argmax()}\n")
print(f"Pred Label:\n---------------\n{labels[output[0].argmax()]}\n")

unnormed_img_tensor= unnormalize(img_tensor)

img_pil = transforms.functional.to_pil_image(unnormed_img_tensor[0])
img_pil.show()

## Attack
### Provided Code
The code below is provided in the lab and must be run for the exercise solution to work. 


In [None]:
n_masks = 1000
eta = 0.005

# Generate a tensor that is a collection of "masks"
# The tensor will have 1000 copies of tensors with the same shape as img_tensor
# The values will have a mean 0 and variance 1 and be scaled down by eta 
mask_collection = torch.randn((n_masks, *img_tensor.shape)).to(device) * eta

# initial mask with shape of img_tensor and values of 0
current_mask = torch.zeros_like(img_tensor).to(device)

# compute our starting index
starting_index = model(img_tensor).argmax(1)
print(f"Starting index is:\n---------------\n{starting_index}\n")

starting_class_score = model(img_tensor + current_mask)[0, starting_index.item()].item()
print(f"Starting class score is:\n---------------\n{starting_class_score}\n")

The untargeted attack is given in the lab. 

### SOLUTION: Exercise 1
We need to have the image of the dog be classified as a robin. The intuition here is slightly different than the untargeted. We want to only apply masks from our mask candidates that will improve the score of the _target_ class returned in the logits from the model.

This intuition can be visualized here.

![](../assets/2-evasion-simba.png)

In [None]:
# your code here
# Zero our current mask
current_mask = torch.zeros_like(img_tensor).to(device)

# Get our starting label index
starting_label = model(img_tensor).argmax(1).item()
current_label = starting_label

# Target class index
target_index = torch.tensor(labels.index('robin')).unsqueeze(0).to(device)

# Get our starting confidence score
best_score = model(img_tensor + current_mask)[0, target_index.item()].item()

# Run until we reclassify successfully ...
while current_label != target_index:

    # Select a random mask from the collection we created
    mask_candidate_idx = np.random.choice(len(mask_collection))
    mask_candidate = mask_collection[mask_candidate_idx]

    # Don't store gradient information while doing inference
    with torch.no_grad():
        output = model(img_tensor + current_mask + mask_candidate)
    
    # Based on our mask addition, get our new label and updated score
    current_label = output.argmax(1).item()
    new_score = output[0, target_index.item()].item()

    # If we haven't hit our target yet and also didn't improve the score, just move on
    # NOTE CHANGED TARGET_LABEL TO STARTING_LABEL because our goal is to NOT BE what we are more than be a specific thing
    if new_score < best_score:
        continue

    # Write some monitoring for dopamine 
    print(f"Best score is: {best_score:4.6f} -- pred score is: {output[0, current_label].item()} -- prediction is: {current_label}  ", end='\r', flush=True)
    
    # Update our current score
    best_score = new_score
    
    # And update our mask
    current_mask += mask_candidate
                
print(f"\n\nWinner winner: {labels[output[0].argmax()]}")

_Help me understand_
- **What changed between these approaches?** We are now moving toward a target class, not away from the original. With that, we only care about the score that the model returns for the _target_ class. `new_score = output[0, target_index.item()].item()` grabs the logit the model returned for the target index. We then skip any candidate masks that do not provide a score as good or better than the best score.
- **Why do we randomly sample the candidate masks instead of interating through them?** We are _accumulating_ a final mask. Remember when we generated our large random mask tensor, we dramatically downscaled the mask individual values. Here we are slowly building a mask out of those tiny random perturbations. `current_mask += mask_candidate` adds the `mask_candidate` that, when added to the `current_mask`, improved our classification score for the target class.

## HopSkipJump
### Resources
All are optional. The video at the top of the lab should provide you with the basic context, this is just if you want to know more.
- [HopSkipJump Paper](https://arxiv.org/abs/1904.02144)
- ["I ain't reading all that", fine, here's a video someone made on the paper](https://www.youtube.com/watch?v=vkCifg2rp34)

![](../assets/2-evasion-hsj.png)

*Source: linked paper above*

No code here, these exercises are really just about your understanding of the intuition behind HopSkipJump.

### SOLUTION: Exercise 2
"Why is the `normalized_gradient` the same shape and size as our `img_tensor`"?

A gradient, at its core, is all of a function's partial derivatives. Everything comes back to calculus. 

Say we have a function 

$$
f(x, y) = x^3 + 3y^2
$$ 


This function has **two variables** and therefore **two partial derivatives**. Remember when we take the partial derivative of a function with respect to one variable, we treat all other variables as constants. Using the visual above, you can think of the partial derivative of the function with respect to x at any point 

$$
\frac{\partial{(x^3 + 3y^2)}}{\partial{x}} = 3x^2
$$

$$
\frac{\partial{(x^3 + 3y^2)}}{\partial{y}} = 6y
$$

The gradient of a tensor the partial derivative of a function (a loss function in our case) with respect to each individual element. So the gradient of $f(x, y) = x^3 + 3y^2$ would just be a 2 element vector, $[3x^2, 6y]$. 

The gradient of a function will always have the same number of components as the number of variables in the function. The gradient is a vector of partial derivatives of the loss function with respect to each input element. It shows how the loss changes with small changes in each input dimension. It just so happens in our case that _every_ pixel in the image is a "variable".

### SOLUTION: Exercise 3

Okay, I lied, you will need to run this chunk of setup code for these solutions to work in this notebook. This way you don't need to jump back and forth.

In [None]:
img_tensor = preprocess(img).unsqueeze(0).to(device)

# move sample to the right device
img_tensor = img_tensor.to(device)

with torch.no_grad():
    output = model(img_tensor)
    
y_original = output[0].argmax()

def adversarial_satisfactory(samples, target, clip_min, clip_max):
    samples = torch.clamp(samples, clip_min, clip_max)
    
    with torch.no_grad():
        preds = model(samples).argmax(dim=1)
    
    # any other class
    result = preds != target
    return result

clip_min = -2
clip_max = +2

# randomly seed the generator
generator = torch.Generator().manual_seed(0)

# now generate a misclassified sample; we'll give ourselves 10 tries
for _ in range(10):
    random_img = torch.FloatTensor(img_tensor.shape).uniform_(clip_min,clip_max, generator=generator).to(device)
    random_class = model(random_img).argmax()

    if adversarial_satisfactory(random_img, y_original, clip_min, clip_max):
        initial_sample = (random_img, y_original)

        print(f"Found misclassified image: {random_class}")
        break

threshold = 0.01 / torch.sqrt(torch.prod(torch.tensor(img_tensor.shape, dtype=torch.float).to(img_tensor.device)))

upper_bound, lower_bound = 1,0 

1. The boundary adversarial example does in fact get closer to the original image in img_tensor every time we project then bisect. (think distance!)

We can iteratively repeat the projection and bisection happening within the `while` loop. It currently stops as soon as the adversarial conditions are satisfied, but we could move it closer and closer by simply repeating the inner steps in that loop for N iterations, and add the computation of the L2 norm between the current anchor image and the original image.

In [None]:
distances = []
for i in range(100):
    boundary_adversarial_example = (1-upper_bound)*img_tensor + upper_bound*random_img
    distance = torch.norm(boundary_adversarial_example - img_tensor, p=2)
    distances.append(distance.item())
    midpoint = (upper_bound + lower_bound) / 2.0
    
    interpolated_sample = (1 - midpoint) * img_tensor + midpoint * random_img
    
    if adversarial_satisfactory(interpolated_sample, y_original, clip_min, clip_max):
        # the decision boundary lies between midpoint and lower
        upper_bound, lower_bound = midpoint, lower_bound
    else:
        # it's the other way
        upper_bound, lower_bound = upper_bound, midpoint
    
boundary_adversarial_example = (1-upper_bound)*img_tensor + upper_bound*random_img

2. The distance between the original image and the modified image begins to slowly stabilize to some constant value. (track distance through time)

If we run the above code, you'll see it converges around 140 very quickly. We definitely don't need 100 iterations. We can see this via a plot.

In [None]:
plt.figure(figsize=(10, 6))
plt.plot(range(len(distances)), distances, marker='o')

plt.title('Distance between Anchor and Original w.r.t. HSJ Iteration')
plt.xlabel('Index')
plt.ylabel('Distance')

3. (Optional) Add an early exit criteria to the HopSkipJump attack to reduce the number of model calls (don't just reduce the number of iterations; stop in a data-driven way), and compare the results from the full run to the early exit run. As an attacker, why might you want to early exit from the optimization?

In the same way as above, we can use the distance between the new anchor image and the target image. As we see, it converges pretty quickly. We could add an early exit condition to halt when the distance stops changing by some treshold amount. 

Why? Well, we're an attacker, and we don't want to get caught! Hammering a model like this could trigger detections or rate limits if we aren't careful.

# More Practice
You can try your hand at more of these at [Crucible](crucible.dreadnode.io), an AI CTF. The "Granny" challenge in particular may interest you after this lab.