# Tensorflow Adversarial Embedding MNIST demo for a Dioptra deployment

This demo will cover the adversarial clean label backdoor attack on an MNIST-LeNet model.
The following two sections cover experiment setup and is similar across all demos.
Please see the [example README](README.md) for instructions on how to prepare your environment for running this example.

## Setup: Experiment Name and MNIST Dataset

Here we will import the necessary Python modules and ensure the proper environment variables are set so that all the code blocks will work as expected.

**Important: Users will need to verify or update the following parameters:**

- Ensure that the `USERNAME` parameter is set to your own name.
- Ensure that the `DATASET_DIR` parameter is set to the location of the MNIST dataset directory. Currently set to `/dioptra/data/Mnist` as the default location.
- (Optional) Set the `EXPERIMENT_NAME` parameter to your own preferred experiment name.

Other parameters can be modified to alter the RESTful API and MLFlow tracking addresses. 

In [None]:
# Import packages from the Python standard library
import importlib.util
import os
import sys
import pprint
import time
import warnings
from pathlib import Path

def register_python_source_file(module_name: str, filepath: Path) -> None:
    """Import a source file directly.

    Args:
        module_name: The module name to associate with the imported source file.
        filepath: The path to the source file.

    Notes:
        Adapted from the following implementation in the Python documentation:
        https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly
    """
    spec = importlib.util.spec_from_file_location(module_name, str(filepath))
    module = importlib.util.module_from_spec(spec)
    sys.modules[module_name] = module
    spec.loader.exec_module(module)


# Filter out warning messages
warnings.filterwarnings("ignore")

# Ensure that the dataset location is properly set here.
DATASET_DIR = "/dioptra/data/Mnist"

# Experiment name (note the username_ prefix convention)
EXPERIMENT_NAME = "mnist_clean_label_backdoor"

# Default address for accessing the RESTful API service
RESTAPI_ADDRESS = "http://localhost:80"

# Set DIOPTRA_RESTAPI_URI variable if not defined, used to connect to RESTful API service
if os.getenv("DIOPTRA_RESTAPI_URI") is None:
    os.environ["DIOPTRA_RESTAPI_URI"] = RESTAPI_ADDRESS

# Default address for accessing the MLFlow Tracking server
MLFLOW_TRACKING_URI = "http://localhost:35000"

# Set MLFLOW_TRACKING_URI variable, used to connect to MLFlow Tracking service
if os.getenv("MLFLOW_TRACKING_URI") is None:
    os.environ["MLFLOW_TRACKING_URI"] = MLFLOW_TRACKING_URI

# Path to workflows archive
WORKFLOWS_TAR_GZ = Path("workflows.tar.gz")

# Register the examples/scripts directory as a Python module
register_python_source_file("scripts", Path("..", "scripts", "__init__.py"))

from scripts.client import DioptraClient
from scripts.utils import make_tar

# Import third-party Python packages
import numpy as np
from mlflow.tracking import MlflowClient

# Create random number generator
rng = np.random.default_rng(54399264723942495723666216079516778448)

## Submit and run jobs

The entrypoints that we will be running in this example are implemented in the Python source files under `src/` and the `src/MLproject` file.
To run these entrypoints within Dioptra's architecture, we need to package those files up into an archive and submit it to the Dioptra RESTful API to create a new job.
For convenience, we provide the `make_tar` helper function defined in `examples/scripts/utils.py`.

In [None]:
make_tar(["src"], WORKFLOWS_TAR_GZ)

To connect with the endpoint, we will use a client class defined in the `examples/scripts/client.py` file that is able to connect with the Dioptra RESTful API using the HTTP protocol.
We connect using the client below.
The client uses the environment variable `DIOPTRA_RESTAPI_URI`, which we configured at the top of the notebook, to figure out how to connect to the Dioptra RESTful API.

In [None]:
restapi_client = DioptraClient()

We need to register an experiment under which to collect our job runs.
The code below checks if the relevant experiment exists.
If it does, then it just returns info about the experiment, if it doesn't, it then registers the new experiment.

In [None]:
response_experiment = restapi_client.get_experiment_by_name(name=EXPERIMENT_NAME)

if response_experiment is None or "Not Found" in response_experiment.get("message", []):
    response_experiment = restapi_client.register_experiment(name=EXPERIMENT_NAME)

response_experiment

The following helper functions will recheck the job responses until the job is completed or a run ID is available. 
The run ID is needed to link dependencies between jobs.

In [None]:
def mlflow_run_id_is_not_known(job_response):
    return job_response["mlflowRunId"] is None and job_response["status"] not in [
        "failed",
        "finished",
    ]


def get_run_id(job_response):
    while mlflow_run_id_is_not_known(job_response):
        time.sleep(1)
        job_response = restapi_client.get_job_by_id(job_response["jobId"])
        
    return job_response


def wait_until_finished(job_response):
    # First make sure job has started.
    job_response = get_run_id(job_response)
    
    # Next re-check job until it has stopped running.
    while (job_response["status"] not in ["failed", "finished"]):
        time.sleep(1)
        job_response = restapi_client.get_job_by_id(job_response["jobId"])
    
    return job_response


# Helper function for viewing MLflow results
def get_mlflow_results(job_response):
    mlflow_client = MlflowClient()
    job_response = wait_until_finished(job_response)
    
    if(job_response['status']=="failed"):
        return {}
    
    run = mlflow_client.get_run(job_response["mlflowRunId"])  
    
    while(len(run.data.metrics) == 0):
        time.sleep(1)
        run = mlflow_client.get_run(job_response["mlflowRunId"])
        
    return run


def print_mlflow_results(response):
    results = get_mlflow_results(response)
    pprint.pprint(results.data.metrics)

## MNIST Training: Baseline Model

Next, we need to train our baseline model that will serve as a reference point for the effectiveness of our attacks.
We will be submitting our job to the `"tensorflow_gpu"` queue.

In [None]:
response_le_net_train = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="train",
    entry_point_kwargs=" ".join([
        "-P batch_size=256",
        f"-P register_model_name={EXPERIMENT_NAME}_le_net",
        "-P model_architecture=le_net",
        "-P epochs=30",
    ]),
    queue="tensorflow_cpu",
    timeout="1h",    
)

print("Training job for LeNet-5 neural network submitted")
print("")
pprint.pprint(response_le_net_train)

response_le_net_train = get_run_id(response_le_net_train)
print_mlflow_results(response_le_net_train)

In [None]:
# Train a special model for making poisons
response_le_net_train_robust = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="train_madry_pgd",
    entry_point_kwargs=" ".join([
        "-P batch_size=256",
        f"-P register_model_name={EXPERIMENT_NAME}_robust_le_net",
        "-P model_architecture=le_net",
        "-P epochs=10",
    ]),
    queue="tensorflow_cpu",
    timeout="1h",
    
)

print("Training job for LeNet-5 neural network submitted")
print("")
pprint.pprint(response_le_net_train_robust)

response_le_net_train_robust = get_run_id(response_le_net_train_robust)
print_mlflow_results(response_le_net_train_robust)

### Generating Poisoned Images

Now we will create our set of poisoned images.
Start by submitting the poison generation job below.

In [None]:
## Create poisoned test images.
response_gen_poison_le_net_test = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="gen_poison_data",
    entry_point_kwargs=" ".join(
        [
            "-P batch_size=100",
            "-P target_class=1",
            "-P poison_fraction=1",
            "-P label_type=test"
        ]
    ),
    queue="tensorflow_cpu",
    depends_on=response_le_net_train["jobId"],
)

print("Backdoor poison attack (LeNet-5 architecture) job submitted")
print("")
pprint.pprint(response_gen_poison_le_net_test)
print("")

response_gen_poison_le_net_test = get_run_id(response_gen_poison_le_net_test)

In [None]:
## Create poisoned training images (clean_label)
response_gen_poison_le_net_train_clean = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="gen_poison_clean_data",
    entry_point_kwargs=" ".join(
        [
             f"-P model_name={EXPERIMENT_NAME}_robust_le_net",
            "-P model_version=none",
            "-P batch_size=200",
            "-P target_index=1",
            "-P poison_fraction=0.33",
            "-P label_type=train"
        ]
    ),
    queue="tensorflow_cpu",
    depends_on=response_le_net_train_robust["jobId"],
)

print("Backdoor poison attack (LeNet-5 architecture) job submitted")
print("")
pprint.pprint(response_gen_poison_le_net_train_clean)
print("")

response_gen_poison_le_net_train_clean = get_run_id(response_gen_poison_le_net_train_clean)

## MNIST Training: Poisoned Model using a Clean Label technique

Next we will train our poisoned model using a clean label technique. 

In [None]:
# Now train a new model on the poisoned clean label images
response_le_net_train_backdoor_model_clean = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="train_on_run_dataset",
    entry_point_kwargs=" ".join([
        "-P batch_size=256",
        f"-P register_model_name={EXPERIMENT_NAME}_data_poison_le_net",
        "-P model_architecture=le_net",
        "-P epochs=30",
        "-P load_dataset_from_mlruns=true",
        f"-P dataset_run_id_training={response_gen_poison_le_net_train_clean['mlflowRunId']}",
        "-P adv_tar_name=adversarial_poison.tar.gz",
        "-P adv_data_dir=adv_poison_data" 
    ]),
    depends_on=response_gen_poison_le_net_train_clean["jobId"],
    queue="tensorflow_cpu",  
    timeout="1h",
)

print("Training job for LeNet-5 neural network submitted")
print("")
pprint.pprint(response_le_net_train_backdoor_model_clean)

response_le_net_train_backdoor_model_clean = get_run_id(response_le_net_train_backdoor_model_clean)
print_mlflow_results(response_le_net_train_backdoor_model_clean)

## Model Evaluation: Poisoned vs Regular Models on Backdoor-Poisoned Images.

Below we will compare the results of the regular model vs poisoned-backdoor model on backdoor test images.

In [None]:
# Inference: Model trained on poisoned backdoor attack
response_infer_pos_model_clean = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="infer",
    entry_point_kwargs=" ".join(
        [
            f"-P run_id={response_gen_poison_le_net_test['mlflowRunId']}",
            f"-P model_name={EXPERIMENT_NAME}_data_poison_le_net",
            "-P model_version=none",
            "-P batch_size=512",
            "-P adv_tar_name=adversarial_poison.tar.gz",
            "-P adv_data_dir=adv_poison_data",
        ]
    ),
    queue="tensorflow_cpu",
    depends_on=response_le_net_train_backdoor_model_clean["jobId"],
)

print("Inference job for LeNet-5 neural network submitted")
print("")

pprint.pprint(response_infer_pos_model_clean)
response_infer_pos_model_clean = get_run_id(response_infer_pos_model_clean)
print_mlflow_results(response_infer_pos_model_clean)

In [None]:
# Inference: Regular model on poisoned test images
response_infer_reg_model = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="infer",
    entry_point_kwargs=" ".join(
        [
            f"-P run_id={response_gen_poison_le_net_test['mlflowRunId']}",
            f"-P model_name={EXPERIMENT_NAME}_le_net",
            "-P model_version=none",
            "-P batch_size=512",
            "-P adv_tar_name=adversarial_poison.tar.gz",
            "-P adv_data_dir=adv_poison_data",
        ]
    ),
    queue="tensorflow_cpu",
    depends_on=response_le_net_train["jobId"],
)

print("Inference job for LeNet-5 neural network submitted")
print("")
pprint.pprint(response_infer_reg_model)
print_mlflow_results(response_infer_reg_model)

## Defending against the clean label poisoning attack

Now we will explore available defenses on the adversarial backdoor poisoning attack.
The following three jobs will run a selected defense (spatial smoothing, gaussian augmentation, or jpeg compression) and evaluate the defense on the baseline and backdoor trained models.

- The first job uses the selected defense entrypoint to apply a preprocessing defense over the poisoned test images.
- The second job runs the defended images against the poisoned backdoor model.
- The final job runs the defended images against the baseline model.

Ideally the defense will not impact the baseline model accuracy, while improving the backdoor model accuracy scores.

In [None]:
defenses = ["gaussian_augmentation", "spatial_smoothing", "jpeg_compression"]
defense = defenses[0]

response_poison_def = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point=defense,
    entry_point_kwargs=" ".join(
        [
            "-P batch_size=20",
            "-P load_dataset_from_mlruns=true",
            f"-P dataset_run_id={response_gen_poison_le_net_test['mlflowRunId']}",
            "-P dataset_tar_name=adversarial_poison.tar.gz",
            "-P dataset_name=adv_poison_data",
        ]
    ),
    queue="tensorflow_cpu",
    depends_on=response_gen_poison_le_net_test["jobId"],
)


print(f"FGM {defense} defense (LeNet architecture) job submitted")
print("")
pprint.pprint(response_poison_def)
print("")

response_poison_def = get_run_id(response_poison_def)

In [None]:
# Inference: Poisoned model on poisoned test images.
response_infer_pos_model = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="infer",
    entry_point_kwargs=" ".join(
        [
            f"-P run_id={response_poison_def['mlflowRunId']}",
            f"-P model_name={EXPERIMENT_NAME}_data_poison_le_net",
            f"-P model_version=none",
            "-P batch_size=512",
            f"-P adv_tar_name={defense}_dataset.tar.gz",
            "-P adv_data_dir=adv_testing",
        ]
    ),
    queue="tensorflow_cpu",
    depends_on=response_poison_def["jobId"],
)

print("Inference job for LeNet-5 neural network submitted")
print("")
pprint.pprint(response_infer_pos_model)
print_mlflow_results(response_infer_pos_model)


In [None]:
# Inference: Regular model on poisoned test images.
response_infer_reg_model = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="infer",
    entry_point_kwargs=" ".join(
        [
            f"-P run_id={response_poison_def['mlflowRunId']}",
            f"-P model_name={EXPERIMENT_NAME}_le_net",
            f"-P model_version=none",
            "-P batch_size=512",
            f"-P adv_tar_name={defense}_dataset.tar.gz",
            "-P adv_data_dir=adv_testing",
        ]
    ),
    queue="tensorflow_cpu",
    depends_on=response_poison_def["jobId"],
)

print("Inference job for LeNet-5 neural network submitted")
print("")
pprint.pprint(response_infer_reg_model)
print_mlflow_results(response_infer_reg_model)