![DLI Logo](../assets/DLI_Header.png)

# Introduction

In this module, you will learn how to think about machine learning problems and attacks as **optimizations** and we'll introduce a new tool: [Optuna](https://optuna.org/).

## Learning Objectives:
1. Apply `Optuna` to optimize attack hyperparameters

# Optimize
We've reached a crucial milestone and we're about to come full circle on a whole load of concepts. We ended the previous section lamenting that there are entirely too many techniques to know which is best. Even if you knew which algorithm was best for your situation, there are _still_ hyperparameters to choose. We've conveniently hand-waved explaining hyperparameters in any real detail in anticipation of this lab. Rather than optimize a model (for which there are several references), we're going to optimize our attacks using Optuna.

Optuna is an open-source hyperparameter optimization (HPO) framework, which is designed to optimize machine learning model parameters. It's normally used to automate the process of finding the best set of hyperparameters for a model. Recall that hypterparameters are typically the "fixed" values of an algorithm that define behavior or constraints (like a distance metric) it needs to work within. Optuna basically builds another model using Bayesian optimization techniques to infer the optimal values.

1. **Define a Prior**: This is an initial assumption about the function. 
2. **Collect Data and Update the Prior**: This involves evaluating the actual function at certain points, and then using this data to update our prior belief about the function. This updated belief is called the posterior, and it represents a kind of best guess at what the function looks like, given the current data.
3. **Choose the Next Point to Evaluate**: After updating the posterior, we need to decide where to evaluate the function next. This is done by applying an acquisition function to the posterior. The acquisition function trades off exploration (testing areas where we are uncertain about the function) and exploitation (focusing on areas where the function seems high). Common choices for the acquisition function include `Expected Improvement`, `Probability of Improvement`, and `Upper Confidence Bound`.
4. **Iterate**: Steps 2 and 3 are then repeated.

The goal of using Optuna (and HPO in general) is to find the set of hyperparameters that will result in the best performance for a given machine learning model, based on a specified evaluation metric. This process involves defining a space of possible hyperparameters and then systematically exploring this space, typically through multiple `trials`. An advantage of Bayesian Optimization is that it doesn't require any derivatives (gradients) of the function, which makes it suitable for optimizing "Blackbox" functions. It's perhaps best to think of Optuna as an optimizer of processes, rather than data. 

In this lab we're going to revisit work from previous labs and apply optimization techniques to make them _better_. 


# Imports and Model
We'll start by importing everything we need at the top and loading our target model.

In [None]:
# DO NOT CHANGE

import sys
sys.path.append("../")
from libs.controls import modifier

import optuna
import torch
import numpy as np
import zipfile
from torchvision import transforms
from PIL import Image

from art.estimators.classification import BlackBoxClassifierNeuralNetwork
from art.utils import compute_success
from art.attacks.evasion import SimBA
from torch.nn.functional import softmax

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

In [None]:
# DO NOT CHANGE

target_model = torch.hub.load('pytorch/vision:v0.10.0', 'mobilenet_v2', weights='MobileNet_V2_Weights.DEFAULT', verbose=False)
target_model.eval()
target_model.to(device);

# Define the transforms for preprocessing
preprocess = transforms.Compose([
    transforms.Resize(256), 
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
]);

unnormalize = transforms.Normalize(
   mean= [-m/s for m, s in zip([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])],
   std= [1/s for s in [0.229, 0.224, 0.225]]
)

with open("../data/labels.txt", 'r') as f:
    labels = [label.strip() for label in f.readlines()]

img = Image.open("../data/dog.jpg")
img_tensor = preprocess(img).unsqueeze(0)
unnormed_img_tensor = unnormalize(img_tensor).to(device)

# Optuna
Optuna is easy to use, here is how we setup an optimization problem. We'll use a toy example where we want to find the minimum of the function $(x-2)^2$.  The true answer is $x=2$, but Optuna doesn't know that and will need to sample the space we define and and infer the answer from the examples it creates.

First we define an `objective` function.  Think of this like a loss function. Here we want to minimize the squared difference between $x$ and $2$.  We'll define the sample space as the float range $-10<x<10$.

In [None]:
# DO NOT CHANGE

def objective(trial):
    x = trial.suggest_float('x', -10, 10)
    return (x - 2) ** 2

Then we create a "study". `create_study` provides an entry point to create and configure a `Study` instance, which then gets passed to `optimize` (and other functions to carry out the tuning process). When you call `optimize` on the `Study`, the `Study` manages the full optimization loop internally, leveraging the Study's samplers and algorithms to effectively search the hyperparameter space. Then, after `n_trials` have been completed, Optuna will have the "best" parameters from within the tested range.

In [None]:
# DO NOT CHANGE

study = optuna.create_study()

In [None]:
# DO NOT CHANGE

# Run the objective n_trials times to optimize
study.optimize(objective, n_trials=20)

Let's see how well Optuna did finding the minimum.  You can run the cell above with more `n_trials` to get closer to the true minimum.

In [None]:
# DO NOT CHANGE

print(f"Best params::\n---------------\n{study.best_params}\n")

By default, Optuna uses Tree-Parzen Estimator (TPE), a sequential model-based optimization (SMBO) approach that builds a probabilistic model of the objective function to suggest new parameters. TPE specifically models $P(x|y)$ and $P(y)$ where $x$ represents parameters and $y$ is the associated cost. The Bayesian base of this algorithm means it becomes better at selecting parameters as the number of trials increase. 

- **Initialization**: Start with initial random samples from the defined space and their corresponding objective function values.
- **Model building**: Using the collected data, build two probability models, `l(x)` for parameters that improved the model's performance and `g(x)` for parameters that did not improve. The model used to estimate these probabilities.
- **Suggestion**: For the next round, suggest new hyperparameters to try. This is based on calculating the Expected Improvement (EI) over the current best parameters, where the EI of a set of hyperparameters is proportional to the ratio `l(x) / g(x)`. This suggests that we prefer regions of the space where good hyperparameters are more likely than bad ones according to the model.
- **Iteration**: Evaluate the objective function with the new parameters, update the models, and repeat from step 2. Over time, the TPE algorithm should hone in on the best parameters.

The primary of advantages of TPE (and other SMBO techniques) is that it can handle both continuous and discrete hyperparameters, and it doesn't require the objective function to be differentiable (as gradient-based methods do).

# Optimize an Attack
First, we'll bring forward ART and write a `predict` function for the attack to use. Then we build the attack as before. One thing you may notice is instead of `BlackBoxClassifier`, we now have `BlackBoxClassifierNeuralNetwork`. This is because we're using `SimBA` which requires probabilities and ART is reasonably strict about what is need upfront before you can start running an attack.

In [None]:
# DO NOT CHANGE

class ModelWrapper: 
    def __init__(self):
        pass
    
    def predict(self, x):
        torch_tensor = torch.from_numpy(x).to(device)

        with torch.no_grad():
            output = target_model(torch_tensor)        
        probs = torch.softmax(output, dim=1).cpu().numpy()
        return probs

model_wrapper = ModelWrapper()

In [None]:
# DO NOT CHANGE

classifier = BlackBoxClassifierNeuralNetwork(
    predict_fn = lambda x:model_wrapper.predict(x),
    nb_classes = len(labels),
    input_shape = img_tensor[0].shape
)

attack = SimBA(classifier)

In [None]:
# DO NOT CHANGE

print("Attack params\n---------------")
[print(f"{i}: {attack.__dict__.get(i)}") for i in attack.attack_params];

# 

This is the part we've failed to mention - don't be mad. `SimBA` (and pretty much all attacks) have default hyperparameters that are set when the attack is created. However, these values are default, and we're not confident where they came from or the assumptions made when these values were chosen. We've learned that manually coming up the reasonable values is futile. So here we are, using yet another optimization technique to make a number go arbitrarily up or down. If you've managed to run into a certain `batch_size` and never figured it out; here is the answer to why `1` sample went to `64` samples in your `predict` functions in ART. 

Let's wrap our attack with Optuna and define what we want to minimize in our attack. In this case, we want to minimize `l2_norm`.  

Here we write an objective function that runs a chosen `attack` (SimBA). As we're not giving Optuna any variables or suggested ranges, it won't optimize anything yet. 

You will see all the parameter values it chose (just an empty set for now) and the result of the `trial`, before printing the best parameters (none yet) at the end.

In [None]:
# DO NOT CHANGE

def objective(trial, attack, x):
    with torch.no_grad():
        results = attack.generate(x=x)
        
    l2_norm = torch.norm(img_tensor - results, p=2).float()

    return l2_norm

In [None]:
# DO NOT CHANGE

study = optuna.create_study()
study.optimize(
    lambda trial: objective(trial, attack=attack, x=img_tensor.numpy()), 
    n_trials=5, show_progress_bar=True
);

print(f"\nBest params::\n---------------\n{study.best_params}\n")

The same result every time - this is a good baseline to start experimenting with. Now you have mechanics down, lets minimize this `L2` distance metric. Here we update the `objective` function to

1. Give Optuna access to the attack parameters. Ranges for the paramaters here are arbitrarily chosen to bracket the default attack hyperparameters we exposed earlier.
2. Update the attack with the new parameters 
3. Execute the attack
4. Return the L2 distance between the original image and the adversarial image. This is what we want to minimize (arbitrarily)

We hope to see is the `L2` distance going down...

In [None]:
# DO NOT CHANGE

def objective(trial, attack, x):
    new_params = {
        "max_iter": trial.suggest_int('max_iter',  10, 3000),
        "epsilon": trial.suggest_float('epsilon', 1e-6, 1.0),
        "freq_dim": trial.suggest_int('freq_dim', 1, 20),
        "stride": 1,
        "batch_size": 1,
        "verbose": False
    }

    attack.set_params(**new_params)
    
    with torch.no_grad():
        results = attack.generate(x = x)
        
    l2_norm = torch.norm(img_tensor - results, p=2).float()

    return l2_norm

study = optuna.create_study()
study.optimize(
    lambda trial: objective(trial, attack=attack, x=img_tensor.numpy()), n_trials=50, show_progress_bar=True
);

print(f"\nBest params::\n---------------\n{study.best_params}\n")

That's quite the improvement over the default values for our `L2` metric! This metric is a little arbitrary as the attack technically already does this. Let's do something more useful. 

## Optimize 2 Values

Here we add another metric that is more realistic - the number of queries, `num_queries`, we send the model. First we need to add logging functionality to our attack, which we will do in the predict function. Then we can simply return another value from the `objective` function and update the `study.optimize` call to provide a list of `directions`.  

In [None]:
# DO NOT CHANGE

class ModelWrapper: 
    def __init__(self):
        self.__reset__()
    
    def predict(self, x):
        torch_tensor = torch.from_numpy(x).to(device)

        with torch.no_grad():
            output = target_model(torch_tensor)        
        probs = torch.softmax(output, dim=1).cpu().numpy()
        
        self.num_queries += 1
        
        return probs
    
    def __reset__(self):
        self.num_queries = 0

model_wrapper = ModelWrapper()

In [None]:
# DO NOT CHANGE

classifier = BlackBoxClassifierNeuralNetwork(
    predict_fn = lambda x:model_wrapper.predict(x),
    nb_classes = len(labels),
    input_shape = img_tensor[0].shape
)

attack = SimBA(classifier)

We set `num_queries = 0` using the `__reset__` method and run the attack, returning both `l2_norm` and `num_queries` . Then update the `optimize` call to include directions for _both_ return values, and Optuna will take care of the rest!

In [None]:
# DO NOT CHANGE

def objective(trial, attack, x):
    model_wrapper.__reset__()
    new_params = {
        "max_iter": trial.suggest_int('max_iter',  10, 3000),
        "epsilon": trial.suggest_float('epsilon', 1e-6, 1.0),
        "freq_dim": trial.suggest_int('freq_dim', 1, 20),
        "stride": 1,
        "batch_size": 1,
        "verbose": False
    }

    attack.set_params(**new_params)
    
    with torch.no_grad():
        results = attack.generate(x = x)
    l2_norm = torch.norm(img_tensor - results, p=2)
    
    num_queries = model_wrapper.num_queries
    
    return l2_norm, num_queries

In [None]:
# DO NOT CHANGE

study = optuna.create_study(directions=["minimize", "minimize"])
study.optimize(
    lambda trial: objective(trial, attack=attack, x=img_tensor.numpy()), n_trials=100, show_progress_bar=True
);

## Tiny Assessment (Optional Challenge)

:::{exercise}
Put it all together for a full assessment. We want to characterize the robustness of this model (or effectiveness of the attack) under different constraints for query budget.  Do the following.

1. Rewrite the objective function so that it performs multiple trials (say 100), and computes the attack accuracy.
2. Fix the query budget; optimize the remaining hyperparameters (`epsilon` and `freq_dim`) under the constrained query budget.
3. Perform the optimization/estimation step for a range of query budgets (try 10, 50, 100, 200, 300, 400, 500, 1000)
4. Visualize the results, showing the best attainable success rate of the attack against the query budget. Think through how you would report this to a data scientist.
:::

In [None]:
# your code here

# Conclusion

In this lab we optimized the optimizer! This is another favorite technique of ours - the framework defaults are _okay_, but they are by definition not the strongest possible attack. Each model is different; use Optuna to express that in your assessments. And don't forget to track which parameters work best against which targets!

## What You Learned

1. How to think about ML and ML security problems as optimizations.
2. How to apply `Optuna` for hyperparameter optimization.

**Move on to the [Inversion Module](../5_inversion/1_inversion_and_membership_inference.ipynb).**

![DLI Logo](../assets/DLI_Header.png)