# Tensorflow Fruits360 Patch demo for Securing AI Lab deployment

This notebook contains an end-to-end demonstration for the Securing AI Lab Architecture when it is deployed on the DGX workstation.

## Setup

**Note:** This demo is specifically for the NCCoE DGX Workstation with hostname `dgx-station-2`.

Port forwarding is required in order to run this demo.
The recommended port mapping is as follows:

- Map `localhost:30080` on laptop to `localhost:30080` on `dgx-station-2`
- Map `localhost:35000` on laptop to `localhost:35000` on `dgx-station-2`

A sample SSH config file that enables the above port forwarding is provided below,

> ⚠️ **Edits required**: replace `username` with your assigned username _on the NCCoE virtual machines_!

```conf
# vm hostname: jumphost001
Host nccoe-jumphost001
    Hostname 10.33.53.98
    User username
    Port 54131
    IdentityFile %d/.ssh/nccoe-vm

# vm hostname: dgx-station-2
Host nccoe-k8s-gpu002
    Hostname 192.168.1.28
    User username
    Port 22
    IdentityFile %d/.ssh/nccoe-vm
    ProxyJump nccoe-jumphost001
    LocalForward 30080 localhost:30080
    LocalForward 35000 localhost:35000
```

Now, connect to the NCCoE VPN and SSH into the DGX Workstation,

```bash
ssh nccoe-k8s-gpu002
```

Next, we import the necessary Python modules and ensure the proper environment variables are set so that all the code blocks will work as expected,

In [2]:
# Import packages from the Python standard library
import os
import pprint
import time
import warnings
from pathlib import Path
from typing import Tuple

# Please enter custom username here.
USERNAME = "howard"

# Filter out warning messages
warnings.filterwarnings("ignore")

# Default address for accessing the RESTful API service
RESTAPI_ADDRESS = "http://localhost:30080"

# Base API address
RESTAPI_API_BASE = f"{RESTAPI_ADDRESS}/api"

# Default address for accessing the MLFlow Tracking server
MLFLOW_TRACKING_URI = "http://localhost:35000"

# Path to workflows archive
WORKFLOWS_TAR_GZ = Path("workflows.tar.gz")

# Experiment name (note the username_ prefix convention)
EXPERIMENT_NAME = f"{USERNAME}_fruits360_adversarial_patches"

# Set MLFLOW_TRACKING_URI variable, used to connect to MLFlow Tracking service
if os.getenv("MLFLOW_TRACKING_URI") is None:
    os.environ["MLFLOW_TRACKING_URI"] = MLFLOW_TRACKING_URI

# Import third-party Python packages
import numpy as np
import requests
from mlflow.tracking import MlflowClient

# Import utils.py file
import utils

# Create random number generator
rng = np.random.default_rng(54399264723942495723666216079516778448)

Check that the Makefile works in your environment by executing the `bash` code block below,

In [3]:
%%bash

# Running this will just list the available rules defined in the demo's Makefile.
make

[1mAvailable rules:[m

[36mclean              [m Remove temporary files 
[36mdata               [m Download and prepare MNIST dataset 
[36minitdb             [m Initialize the RESTful API database 
[36mservices           [m Launch the Minio S3 and MLFlow Tracking services 
[36mteardown           [m Destroy service containers 
[36mworkflows          [m Create workflows tarball 


## Submit and run jobs

The jobs that we will be running are implemented in the Python source files under `src/`, which will be executed using the entrypoints defined in the `MLproject` file.
To get this information into the architecture, we need to package those files up into an archive and upload it to the lab API.
For convenience, the `Makefile` provides a rule for creating the archive file, just run `make workflows`,

In [4]:
%%bash

# Create the workflows.tar.gz file
make workflows

make: Nothing to be done for 'workflows'.


To connect with the endpoint, we will use a client class defined in the `utils.py` file that is able to connect with the lab's RESTful API using the HTTP protocol.
We connect using the client below,

In [5]:
restapi_client = utils.SecuringAIClient(address=RESTAPI_API_BASE)

We need to register an experiment under which to collect our job runs.
The code below checks if the relevant experiment exists.
If it does, then it just returns info about the experiment, if it doesn't, it then registers the new experiment.

In [6]:
response_experiment = restapi_client.get_experiment_by_name(name=EXPERIMENT_NAME)

if response_experiment is None or "Not Found" in response_experiment.get("message", []):
    response_experiment = restapi_client.register_experiment(name=EXPERIMENT_NAME)

response_experiment

{'experimentId': 11,
 'name': 'howard_fruits360_adversarial_patches',
 'lastModified': '2020-11-05T09:37:19.652250',
 'createdOn': '2020-11-05T09:37:19.652250'}

### Training Baseline Model

Next, we need to train our model.
We will be using the V100 GPUs that are available on the DGX Workstation, which we can use by submitting our job to the `"tensorflow_gpu"` queue.
We will train a VGG16 model on the Fruits360 dataset.

In [7]:
response_vgg16_train = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="train",
    entry_point_kwargs=" ".join([
        "-P batch_size=20",
        "-P register_model=True",
        "-P model_architecture=vgg16",
        "-P epochs=30",
        "-P data_dir_train=/nfs/data/Fruits360-Kaggle-2019/fruits-360/Training",
        "-P data_dir_test=/nfs/data/Fruits360-Kaggle-2019/fruits-360/Test",
    ]),
    queue="tensorflow_gpu",
    timeout="1h",
)

print("Training job for VGG16 neural network submitted")
print("")
pprint.pprint(response_vgg16_train)

Training job for VGG16 neural network submitted

{'createdOn': '2021-01-05T19:03:52.277837',
 'dependsOn': None,
 'entryPoint': 'train',
 'entryPointKwargs': '-P batch_size=20 -P register_model=True -P '
                     'model_architecture=vgg16 -P epochs=30 -P '
                     'data_dir_train=/nfs/data/Fruits360-Kaggle-2019/fruits-360/Training '
                     '-P '
                     'data_dir_test=/nfs/data/Fruits360-Kaggle-2019/fruits-360/Test',
 'experimentId': 11,
 'jobId': '6cd8f13c-25a3-48a1-8497-6fc9786c04af',
 'lastModified': '2021-01-05T19:03:52.277837',
 'mlflowRunId': None,
 'queueId': 2,
 'status': 'queued',
 'timeout': '1h',
 'workflowUri': 's3://workflow/4d86ebfa11294c0799a5ece24b0a164c/workflows.tar.gz'}


### Generating Adversarial Patches

Now that we have our trained model, next we will apply the adversarial patch attack on the network to generate adversarial images.
Then, after we have the adversarial images, we will use them to evaluate some standard machine learning metrics against both models.
This will give us a sense of the transferability of the attacks between models.

This specific workflow is an example of jobs that contain dependencies, as the metric evaluation jobs cannot start until the adversarial image generation jobs have completed.
The lab architecture allows users to declare one-to-many job dependencies like this, which we will use to queue up jobs to start immediately after the previous jobs have concluded.
The code below illustrates this by doing the following:

1. A job is submitted that generates adversarial images based on the VGG16 architecture.
1. We wait until the job starts and a MLFlow identifier is assigned, which we check by polling the API until we see the id appear.
1. Once we have an id returned to us from the API, we queue up the metrics evaluation jobs and declare the job dependency using the `depends_on` option.
1. The message "Dependent jobs submitted" will display once everything is queued up.

In [9]:
def mlflow_run_id_is_not_known(response_patch):
    return response_patch["mlflowRunId"] is None and response_patch["status"] not in [
        "failed",
        "finished",
    ]

# Get job ID for training job
while mlflow_run_id_is_not_known(response_vgg16_train):
    time.sleep(1)
    response_vgg16_train = restapi_client.get_job_by_id(response_vgg16_train['jobId'])

# Create Patches
response_vgg16_patches = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="gen_patch",
    entry_point_kwargs=" ".join(
        [
            f"-P model={EXPERIMENT_NAME}_vgg16/1",
            "-P model_architecture=vgg16",
            "-P data_dir=/nfs/data/Fruits360-Kaggle-2019/fruits-360/Training",
            "-P num_patch_gen_samples=10",
            "-P num_patch=1",
        ]
    ),
    queue="tensorflow_gpu",
    depends_on=response_vgg16_train["jobId"],
)

print("Patch attack (VGG16 architecture) job submitted")
print("")
pprint.pprint(response_vgg16_patches)
print("")

Patch attack (VGG16 architecture) job submitted

{'createdOn': '2021-01-05T19:03:52.277837',
 'dependsOn': None,
 'entryPoint': 'train',
 'entryPointKwargs': '-P batch_size=20 -P register_model=True -P '
                     'model_architecture=vgg16 -P epochs=30 -P '
                     'data_dir_train=/nfs/data/Fruits360-Kaggle-2019/fruits-360/Training '
                     '-P '
                     'data_dir_test=/nfs/data/Fruits360-Kaggle-2019/fruits-360/Test',
 'experimentId': 11,
 'jobId': '6cd8f13c-25a3-48a1-8497-6fc9786c04af',
 'lastModified': '2021-01-05T19:03:56.324629',
 'mlflowRunId': '75de83db06634b4f98e9409f17fc28db',
 'queueId': 2,
 'status': 'started',
 'timeout': '1h',
 'workflowUri': 's3://workflow/4d86ebfa11294c0799a5ece24b0a164c/workflows.tar.gz'}



We can poll the status of the dependent jobs using the code below.
We should see the status of the jobs shift from "queued" to "started" and eventually become "finished".

### Deploying and Testing Adversarial Patches

Now we will apply the adversarial patches over our test set and evaluate the performance of the baseline model on the adversarial patches.

In [12]:
# Wait for Patch attack to finish.
while mlflow_run_id_is_not_known(response_vgg16_patches):
    time.sleep(1)
    response_vgg16_patches = restapi_client.get_job_by_id(response_vgg16_patches['jobId'])


# Deploy Patch attack on training set.
response_deploy_vgg16_patches_training = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="deploy_patch",
    entry_point_kwargs=" ".join(
        [
            f"-P run_id={response_vgg16_patches['mlflowRunId']}",
            f"-P model={EXPERIMENT_NAME}_vgg16/1",
            "-P model_architecture=vgg16",
            "-P data_dir=/nfs/data/Fruits360-Kaggle-2019/fruits-360/Training",
        ]
    ),
    queue="tensorflow_gpu",
    depends_on=response_vgg16_patches["jobId"],
)

print("Patch deployment (VGG16 architecture) job submitted")
print("")
pprint.pprint(response_deploy_vgg16_patches_training)
print("")


while mlflow_run_id_is_not_known(response_deploy_vgg16_patches_training):
    time.sleep(1)
    response_deploy_vgg16_patches_training = restapi_client.get_job_by_id(response_deploy_vgg16_patches_training["jobId"])

    
# Deploy Patch attack on test set.
response_deploy_vgg16_patches_testing = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="deploy_patch",
    entry_point_kwargs=" ".join(
        [
            f"-P run_id={response_vgg16_patches['mlflowRunId']}",
            f"-P model={EXPERIMENT_NAME}_vgg16/1",
            "-P model_architecture=vgg16",
            "-P data_dir=/nfs/data/Fruits360-Kaggle-2019/fruits-360/Test",
        ]
    ),
    queue="tensorflow_gpu",
    depends_on=response_vgg16_patches["jobId"],
)

print("Patch deployment (VGG16 architecture) job submitted")
print("")
pprint.pprint(response_deploy_vgg16_patches_testing)
print("")

# Deploy Patches
while mlflow_run_id_is_not_known(response_deploy_vgg16_patches_testing):
    time.sleep(1)
    response_deploy_vgg16_patches_testing = restapi_client.get_job_by_id(response_deploy_vgg16_patches_testing["jobId"])
    

Patch deployment (VGG16 architecture) job submitted

{'createdOn': '2021-01-06T07:43:22.446223',
 'dependsOn': 'ceed57ae-3372-49ac-aa82-5db2f20c0304',
 'entryPoint': 'deploy_patch',
 'entryPointKwargs': '-P run_id=80fee92e26994f8994c0e78d70d2d946 -P '
                     'model=howard_fruits360_adversarial_patches_vgg16/1 -P '
                     'model_architecture=vgg16 -P '
                     'data_dir=/nfs/data/Fruits360-Kaggle-2019/fruits-360/Training',
 'experimentId': 11,
 'jobId': '7c9d0bb9-779a-461f-a257-e22e5a2c823f',
 'lastModified': '2021-01-06T07:43:22.446223',
 'mlflowRunId': None,
 'queueId': 2,
 'status': 'queued',
 'timeout': '24h',
 'workflowUri': 's3://workflow/212c5e01b31241b59218c8c7c76b8b2f/workflows.tar.gz'}

Patch deployment (VGG16 architecture) job submitted

{'createdOn': '2021-01-06T07:43:26.555919',
 'dependsOn': 'ceed57ae-3372-49ac-aa82-5db2f20c0304',
 'entryPoint': 'deploy_patch',
 'entryPointKwargs': '-P run_id=80fee92e26994f8994c0e78d70d2d946 -P '
  

In [13]:
# Check patched dataset results   

response_infer_vgg16_patch = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="infer",
    entry_point_kwargs=" ".join(
        [
            f"-P run_id={response_deploy_vgg16_patches_testing['mlflowRunId']}",
            f"-P model={EXPERIMENT_NAME}_vgg16/1",
            "-P batch_size=512",
            "-P model_architecture=vgg16",
            "-P dataset_tar_name=adversarial_patch_dataset.tar.gz",
            "-P dataset_name=adv_patch_dataset",
        ]
    ),
    queue="tensorflow_gpu",
    depends_on=response_deploy_vgg16_patches_testing["jobId"],
)

print("Patch evaluation (VGG16 architecture) job submitted")
print("")
pprint.pprint(response_infer_vgg16_patch)
print("")

Patch evaluation (VGG16 architecture) job submitted

{'createdOn': '2021-01-06T07:43:41.731729',
 'dependsOn': 'c6f95f33-5698-4fab-9431-54aa0131cf4f',
 'entryPoint': 'infer',
 'entryPointKwargs': '-P run_id=fcdd79ffac914956b84c4c2bde6aa067 -P '
                     'model=howard_fruits360_adversarial_patches_vgg16/1 -P '
                     'batch_size=512 -P model_architecture=vgg16 -P '
                     'dataset_tar_name=adversarial_patch_dataset.tar.gz -P '
                     'dataset_name=adv_patch_dataset',
 'experimentId': 11,
 'jobId': '1a39d698-9be1-430a-9032-33c19b3c0343',
 'lastModified': '2021-01-06T07:43:41.731729',
 'mlflowRunId': None,
 'queueId': 2,
 'status': 'queued',
 'timeout': '24h',
 'workflowUri': 's3://workflow/99ed3410c94c446980668cc2be4906b1/workflows.tar.gz'}



### Adversarial Training Defense

Finally, we will train a new copy of the VGG16 model on training set that contains adversarial patches.

In [16]:
response_patches_adv_training = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="train",
    entry_point_kwargs=" ".join(
        [
            f"-P testing_dataset_run_id={response_deploy_vgg16_patches_testing['mlflowRunId']}",
            f"-P training_dataset_run_id={response_deploy_vgg16_patches_training['mlflowRunId']}",
            "-P batch_size=256",
            "-P register_model=True",
            "-P model_architecture=vgg16",
            "-P model_tag=adversarial_patch",
            "-P epochs=10",
            "-P data_dir_train=/nfs/data/Mnist/training",
            "-P data_dir_test=/nfs/data/Mnist/testing",
            "-P load_dataset_from_mlruns=True",
        ]
    ),
    queue="tensorflow_gpu",
    depends_on=response_deploy_vgg16_patches_training["jobId"],
)

print("Patch adversarial training (VGG16 architecture) job submitted")
print("")
pprint.pprint(response_patches_adv_training)
print("")

Patch adversarial training (VGG16 architecture) job submitted

{'createdOn': '2021-01-06T09:14:31.382495',
 'dependsOn': '7c9d0bb9-779a-461f-a257-e22e5a2c823f',
 'entryPoint': 'train',
 'entryPointKwargs': '-P '
                     'testing_dataset_run_id=fcdd79ffac914956b84c4c2bde6aa067 '
                     '-P '
                     'training_dataset_run_id=af0147a846dc4882b1f13ff4510813f5 '
                     '-P batch_size=256 -P register_model=True -P '
                     'model_architecture=vgg16 -P model_tag=adversarial_patch '
                     '-P epochs=10 -P data_dir_train=/nfs/data/Mnist/training '
                     '-P data_dir_test=/nfs/data/Mnist/testing -P '
                     'load_dataset_from_mlruns=True',
 'experimentId': 11,
 'jobId': '92b6c71f-f25f-4192-b8f3-aedddce29af7',
 'lastModified': '2021-01-06T09:14:31.382495',
 'mlflowRunId': None,
 'queueId': 2,
 'status': 'queued',
 'timeout': '24h',
 'workflowUri': 's3://workflow/66c95e849bd94645bcfbe450

## Querying the MLFlow Tracking Service

Currently the lab API can only be used to register experiments and start jobs, so if users wish to extract their results programmatically, they can use the `MlflowClient()` class from the `mlflow` Python package to connect and query their results.
Since we captured the run ids generated by MLFlow, we can easily retrieve the data logged about one of our jobs and inspect the results.
To start the client, we simply need to run,

In [None]:
mlflow_client = MlflowClient()

The client uses the environment variable `MLFLOW_TRACKING_URI` to figure out how to connect to the MLFlow Tracking Service, which we configured near the top of this notebook.
To query the results of one of our runs, we just need to pass the run id to the client's `get_run()` method.
As an example, let's query the run results for the patch attack applied to the VGG16 architecture,

In [None]:
run_adv_patches = mlflow_client.get_run(response_patches_adv_training["mlflowRunId"])

If the request completed successfully, we should now be able to query data collected during the run.
For example, to review the collected metrics, we just use,

In [None]:
pprint.pprint(run_adv_patches.data.metrics)

To review the run's parameters, we use,

In [None]:
pprint.pprint(run_adv_patches.data.params)

To review the run's tags, we use,

In [None]:
pprint.pprint(run_adv_patches.data.tags)

There are many things you can query using the MLFlow client.
[The MLFlow documentation gives a full overview of the methods that are available](https://www.mlflow.org/docs/latest/python_api/mlflow.tracking.html#mlflow.tracking.MlflowClient).