# Armory Evaluation Declarations, Composition, and Modification

There are data declarations in charmory.blocks that recapitulate standard evaluations
from the armory package. 

In [1]:
import charmory.blocks.cifar10

baseline = charmory.blocks.cifar10.baseline



The `baseline` evaluation is a composite dataclass with some metadata fields
describing the evaluation:

In [2]:
from pprint import pprint
pprint(f"{type(baseline)=}")
pprint(f"{baseline.name=}\n{baseline.description=}\n{baseline.author=}")

"type(baseline)=<class 'charmory.evaluation.Evaluation'>"
("baseline.name='cifar_baseline'\n"
 "baseline.description='Baseline cifar10 image classification'\n"
 "baseline.author='msw@example.com'")


The charmory `Evaluation` class was called "Experiment" in prior versions of the
JATIC Armory library, but was renamed to avoid confusion with the MLflow conception
of Experiment, which is a collection of runs.

Along with the metadata, an `Evaluation` contains some required components. The
`dataset` is a `Dataset` object, which specifies an Armory dataset and a pair
of necessary parameters:

In [3]:
pprint(f"{baseline.dataset=}")

("baseline.dataset=Dataset(name='CIFAR10', "
 'test_dataset=<armory.data.datasets.ArmoryDataGenerator object at '
 '0x7fe2e80ab050>, train_dataset=<armory.data.datasets.ArmoryDataGenerator '
 'object at 0x7fe2e9c92990>)')


 
the `function` parameter is shown first while details come afterwards; this allows quick
visibility of "this is a cifar10 dataset" from the `__str__` representation of the
object. An `Evaluation` also requires a `Model` and `Scenario`

In [4]:
pprint(f"{baseline.model=}")
pprint(f"{baseline.scenario=}")

("baseline.model=Model(name='pytorch cifar', "
 'model=art.estimators.classification.pytorch.PyTorchClassifier(model=ModelWrapper(\n'
 '  (_model): Net(\n'
 '    (conv1): Conv2d(3, 4, kernel_size=(5, 5), stride=(1, 1))\n'
 '    (conv2): Conv2d(4, 10, kernel_size=(5, 5), stride=(1, 1))\n'
 '    (fc1): Linear(in_features=250, out_features=100, bias=True)\n'
 '    (fc2): Linear(in_features=100, out_features=10, bias=True)\n'
 '  )\n'
 '), loss=CrossEntropyLoss(), optimizer=Adam (\n'
 'Parameter Group 0\n'
 '    amsgrad: False\n'
 '    betas: (0.9, 0.999)\n'
 '    capturable: False\n'
 '    differentiable: False\n'
 '    eps: 1e-08\n'
 '    foreach: None\n'
 '    fused: None\n'
 '    lr: 0.003\n'
 '    maximize: False\n'
 '    weight_decay: 0\n'
 '), input_shape=(32, 32, 3), nb_classes=10, channels_first=False, '
 'clip_values=array([0., 1.], dtype=float32), preprocessing_defences=None, '
 'postprocessing_defences=None, '
 'preprocessing=StandardisationMeanStdPyTorch(mean=0.0, std=1.0, '
 

Here we are using a prefab ART model and the standard Armory Image Classification Task
scenario.

Because this is a "baseline" evaluation, it includes no defense, but does use a PGD
attack to calculate adversarial results.

The optional `Metric` field tells Armory that we want to record additional metrics for
this evaluation. In this case, we are interested in the accuracy of the model on
adversarial examples.

In [5]:
pprint(f"{baseline.attack=}")
pprint(f"{baseline.metric=}")

('baseline.attack=Attack(function=<class '
 "'art.attacks.evasion.projected_gradient_descent.projected_gradient_descent.ProjectedGradientDescent'>, "
 "kwargs={'batch_size': 1, 'eps': 0.031, 'eps_step': 0.007, 'max_iter': 20, "
 "'num_random_init': 1, 'random_eps': False, 'targeted': False, 'verbose': "
 "False}, knowledge='white', use_label=True, type=None, generate_kwargs={}, "
 'sweep_params={}, targeted=False, targeted_labels={})')
("baseline.metric=Metric(profiler_type='basic', "
 "supported_metrics=['accuracy'], perturbation=['linf'], "
 "task=['categorical_accuracy'], means=True, record_metric_per_sample=False)")


The `charmory.blocks` module is intended as a convenient parts cabinet that allows
users to quickly assemble evaluations using standard components. They behave as standard
Python objects, so you can alter the canned definitions:

In [6]:
baseline.metric = None

If you'd prefer to not have additional tracking. Also, the sub-components of an evaluation
are themselves objects, to be composed at user discretion:

In [7]:
print(f"{charmory.blocks.cifar10.metric=}")

# let's put the metric back into baseline
baseline.metric = charmory.blocks.cifar10.metric

charmory.blocks.cifar10.metric=Metric(profiler_type='basic', supported_metrics=['accuracy'], perturbation=['linf'], task=['categorical_accuracy'], means=True, record_metric_per_sample=False)


Instantiation of the `Engine` class using the `Evaluation` object in `baseline` is
straightforward:

In [8]:
import charmory.engine
engine = charmory.engine.Engine(baseline)

In [9]:
result = engine.run()

Evaluation: 100%|██████████| 157/157 [09:53<00:00,  3.78s/it]

2023-07-26 16:36:16 10m [34mMETRIC  [0m [36marmory.instrument.instrument[0m:[36m_write[0m:[36m743[0m benign_mean_categorical_accuracy on benign examples w.r.t. ground truth labels: 0.0999
2023-07-26 16:36:16 10m [34mMETRIC  [0m [36marmory.instrument.instrument[0m:[36m_write[0m:[36m743[0m adversarial_mean_categorical_accuracy on adversarial examples w.r.t. ground truth labels: 0.0773





In [10]:
from pprint import pprint
pprint(result)

{'armory_version': '23.4.0.post113+g0e7be67a.d20230713',
 'evaluation': Evaluation(name='cifar_baseline',
                          description='Baseline cifar10 image classification',
                          model=Model(name='pytorch cifar',
                                      model=art.estimators.classification.pytorch.PyTorchClassifier(model=ModelWrapper(
  (_model): Net(
    (conv1): Conv2d(3, 4, kernel_size=(5, 5), stride=(1, 1))
    (conv2): Conv2d(4, 10, kernel_size=(5, 5), stride=(1, 1))
    (fc1): Linear(in_features=250, out_features=100, bias=True)
    (fc2): Linear(in_features=100, out_features=10, bias=True)
  )
), loss=CrossEntropyLoss(), optimizer=Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.003
    maximize: False
    weight_decay: 0
), input_shape=(32, 32, 3), nb_classes=10, channels_first=False, clip_values=array([0., 1.], dtype=float32)

# Recap

There is a bunch of explanation and debug prints in this notebook, but the
working code used is quite short:

In [None]:
import charmory.blocks.cifar10
import charmory.engine

baseline = charmory.blocks.cifar10.baseline
engine = charmory.engine.Engine(baseline)
result = engine.run()