# Setup

Perform all necessary imports up front

In [None]:
# Python standard imports
from pprint import pprint

# armory-library imports
import armory.engine
import armory.tasks.image_classification
import armory.evaluation

# armory-examples imports
import armory.examples.image_classification.food101 as food101

# Define the Evaluation

## Model

From our `food101` example, we will load a model from HuggingFace that has
already been fine-tuned on the food-101 dataset. We also wrap this model in an
Adversarial Robustness Toolbox (ART) estimator so that we can use an ART attack
against the model.

In [None]:
model, art_estimator = food101.load_model()

## Dataset

From our `food101` example, we will load the food-101 dataset from HuggingFace.

In [None]:
dataset = food101.load_huggingface_dataset(batch_size=2)

## Attack

From our `food101` example, we create a Projected Gradient Descent (PGD) attack
using the Adversarial Robustness Toolbox (ART).

In [None]:
attack = food101.create_attack(art_estimator)

## Metrics

From our `food101` example, we create the metrics to be collected during the
evaluation. These include an L-infinity norm distance between unperturbed and
perturbed input, and a categorical accuracy between the natural labels and the
predicted labels.

In [None]:
metric = food101.create_metric()

# Evaluation

We combine the model, dataset, attack, and metrics to fully define our
evaluation.

We will define two perturbation chains: `benign` and `attack`. The benign chain
does not apply any perturbations to the data, giving us the intrinsic
performance of the model. The attack chain will give us the performance of the
model under adversarial attack.

In [None]:
evaluation = armory.evaluation.Evaluation(
    name="image-classification-food101",
    description="Image classification of food-101",
    author="TwoSix",
    dataset=dataset,
    model=model,
    perturbations={
        "benign": [],
        "attack": [attack],
    },
    metric=metric,
)

# Execute the Evaluation

We create an evaluation engine which will handle the application of all
perturbations, obtaining predictions from the model, collecting metrics, and
exporting of samples.

In [None]:
task = armory.tasks.image_classification.ImageClassificationTask(
    evaluation, export_every_n_batches=1
)
engine = armory.engine.EvaluationEngine(task, limit_test_batches=2)
results = engine.run()

In [12]:
pprint(results)

{'compute': {'Avg. CPU time (s) for 4 executions of attack/perturbation': 4.5436018107502605,
             'Avg. CPU time (s) for 4 executions of attack/perturbation/PGD': 4.543588242249825,
             'Avg. CPU time (s) for 4 executions of attack/predict': 0.07061068424809491,
             'Avg. CPU time (s) for 4 executions of benign/perturbation': 1.361251634079963e-06,
             'Avg. CPU time (s) for 4 executions of benign/predict': 0.1183262479971745},
 'metrics': {'attack/accuracy': tensor(0.),
             'attack/linf_norm': tensor(0.0310),
             'benign/accuracy': tensor(1.),
             'benign/linf_norm': tensor(0.)}}
