# Tensorflow ImageNet ResNet50 FGM Demo

This notebook contains an end-to-end demostration of Dioptra that can be run on any modern laptop.
Please see the [example README](README.md) for instructions on how to prepare your environment for running this example.

## Setup

Below we import the necessary Python modules and ensure the proper environment variables are set so that all the code blocks will work as expected,

In [None]:
# Import packages from the Python standard library
import importlib.util
import os
import sys
import pprint
import time
import warnings
from pathlib import Path


def register_python_source_file(module_name: str, filepath: Path) -> None:
    """Import a source file directly.

    Args:
        module_name: The module name to associate with the imported source file.
        filepath: The path to the source file.

    Notes:
        Adapted from the following implementation in the Python documentation:
        https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly
    """
    spec = importlib.util.spec_from_file_location(module_name, str(filepath))
    module = importlib.util.module_from_spec(spec)
    sys.modules[module_name] = module
    spec.loader.exec_module(module)


# Filter out warning messages
warnings.filterwarnings("ignore")

# Experiment name
EXPERIMENT_NAME = "imagenet_pt_defense"

# Default address for accessing the RESTful API service
RESTAPI_ADDRESS = "http://localhost:80"

# Set DIOPTRA_RESTAPI_URI variable if not defined, used to connect to RESTful API service
if os.getenv("DIOPTRA_RESTAPI_URI") is None:
    os.environ["DIOPTRA_RESTAPI_URI"] = RESTAPI_ADDRESS
    
# Default address for accessing the MLFlow Tracking server
MLFLOW_TRACKING_URI = "http://localhost:35000"

# Set MLFLOW_TRACKING_URI variable, used to connect to MLFlow Tracking service
if os.getenv("MLFLOW_TRACKING_URI") is None:
    os.environ["MLFLOW_TRACKING_URI"] = MLFLOW_TRACKING_URI

# Path to workflows archive
WORKFLOWS_TAR_GZ = Path("workflows.tar.gz")

# Register the examples/scripts directory as a Python module
register_python_source_file("scripts", Path("..", "scripts", "__init__.py"))

from scripts.client import DioptraClient
from scripts.utils import make_tar

# Import third-party Python packages
import numpy as np
from mlflow.tracking import MlflowClient

# Create random number generator
rng = np.random.default_rng(54399264723942495723666216079516778448)

In [None]:
mlflow_queue = "tensorflow_gpu"
data_path_imagenet = "/dioptra/data/ImageNet-Kaggle"

In [None]:
def mlflow_run_id_is_not_known(response):
    return response["mlflowRunId"] is None and response["status"] not in [
        "failed",
        "finished",
    ]

def print_response(jobtype, jobname, response):
    print(f"{jobtype} for job {jobname} submitted.")
    print("")
    pprint.pprint(response)
    print("")
    
def wait_for_job(response):
    while mlflow_run_id_is_not_known(response):
        time.sleep(1)
        response = restapi_client.get_job_by_id(response["jobId"]) 
    return response

## Dataset

We obtained a copy of the ImageNet dataset when we ran `download_data.py` script. If you have not done so already, see [How to Obtain Common Datasets](https://pages.nist.gov/dioptra/getting-started/acquiring-datasets.html).
The training and testing images for the ImageNet dataset are stored within the `/dioptra/data/ImageNet-Kaggle` directory as PNG files that are organized into the following folder structure:

    ImageNet-Kaggle/
    ├── metadata/
    │   ├── image_sets/
    │   └── synset_mapping.txt
    ├── testing/
    │   ├── annotations/
    │   │   ├── n01440764/
    │   │   ├── n01443537/
    │   │   ...
    │   └── images/
    │       ├── n01440764/
    │       ├── n01443537/
    │       ...
    └── training/
        ├── annotations/
        │   ├── n01440764/
        │   ├── n01443537/
        │   ...
        └── images/
            ├── n01440764/
            ├── n01443537/
            ...


The subfolders under `training/` and `testing/` are the classification labels for the images in the dataset.
This folder structure is a standardized way to encode the label information and many libraries can make use of it, including the Tensorflow library that we are using for this particular demo.

## Submit and run jobs

The entrypoints that we will be running in this example are implemented in the Python source files under `src/` and the `src/MLproject` file.
To run these entrypoints within Dioptra's architecture, we need to package those files up into an archive and submit it to the Dioptra RESTful API to create a new job.
For convenience, we provide the `make_tar` helper function defined in `examples/scripts/utils.py`.

In [None]:
make_tar(["src"], WORKFLOWS_TAR_GZ)

To connect with the endpoint, we will use a client class defined in the `examples/scripts/client.py` file that is able to connect with the Dioptra RESTful API using the HTTP protocol.
We connect using the client below.
The client uses the environment variable `DIOPTRA_RESTAPI_URI`, which we configured at the top of the notebook, to figure out how to connect to the Dioptra RESTful API.

In [None]:
restapi_client = DioptraClient()

We need to register an experiment under which to collect our job runs.
The code below checks if the relevant experiment named `"mnist"` exists.
If it does, then it just returns info about the experiment, if it doesn't, it then registers the new experiment.

In [None]:
response_experiment = restapi_client.get_experiment_by_name(name=EXPERIMENT_NAME)

if response_experiment is None or "Not Found" in response_experiment.get("message", []):
    response_experiment = restapi_client.register_experiment(name=EXPERIMENT_NAME)

response_experiment

# Pixel Threshold Resnet50 Model Setup and Attack

For this section of the demo we will first start by loading in the respective baseline ResNet50 model and evaluating its performance on a subset of the ImageNet validation data.

First, we initialize the pre-trained ResNet50 model.

In [None]:
response_init_model = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="init_model",
    entry_point_kwargs=" ".join([
        "-P batch_size=20",
        "-P model_architecture=resnet50",
        f"-P register_model_name={EXPERIMENT_NAME}_pretrained_resnet50",
        "-P imagenet_preprocessing=true",
        "-P image_size=224,224,3",
        f"-P data_dir={data_path_imagenet}/testing",
    ]),
    queue=mlflow_queue,
    timeout="1h",
)

print_response("Model initialization", "ImageNet evaluation", response_init_model)


Next, we create the adversarial Pixel Threshold dataset.

In [None]:
response_pt_resnet50_attack = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="pt",
    entry_point_kwargs=" ".join(
        [
            f"-P model_name={EXPERIMENT_NAME}_pretrained_resnet50",
            "-P model_version=None",
            f"-P data_dir={data_path_imagenet}",
            "-P image_size=224,224,3",
            "-P batch_size=2",
            "-P th=1",
            "-P es=1",
        ]
    ),
    queue=mlflow_queue,
    timeout="96h",
    depends_on=response_init_model['jobId']
)
print_response("Attack", "Pixel Threshold (Resnet 50)", response_pt_resnet50_attack)
response_pt_resnet50_attack = wait_for_job(response_pt_resnet50_attack)