# Train Image Anomaly Detection Model with Amazon SageMaker


In [None]:
import sagemaker

sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
print (role)

## Upload Training data to S3
First we need to upload our data to S3. For simplicity we read all the images into memory and create one giant Numpy array.

In [None]:
import numpy as np
import glob
from PIL import Image

if not os.path.isfile("UCSD_Anomaly_Dataset.tar.gz"):
  response = request.urlretrieve("http://www.svcl.ucsd.edu/projects/anomaly/UCSD_Anomaly_Dataset.tar.gz", "UCSD_Anomaly_Dataset.tar.gz")
  tar = tarfile.open("UCSD_Anomaly_Dataset.tar.gz")
  tar.extractall()
  tar.close()

files = sorted(glob.glob('UCSD_Anomaly_Dataset.v1p2/UCSDped1/Train/*/*'))
a = np.zeros((int(len(files),1,100,100))

for idx in range(0, len(files)):
    im = Image.open(files[idx])
    im = im.resize((100,100))
    a[idx,0,:,:] = np.array(im, dtype=np.float32)/255.0

np.save("input_data_sagemaker", a)
inputs = sagemaker_session.upload_data(path='input_data_sagemaker.npy', bucket='s3://MY_S3_BUCKET', key_prefix='data')

## Train and Deploy the standard way
You need to upload the training data to an S3 bucket: check the script ```upload_data.py``` how to upload data to S3.
Once the data is uploaded you can define the MXNet Estimator, which takes as argument an entry point ```train.py```, the role, the training instance type, the path where data is located and another path where the code shall be uploaded. If you don't indicate these paths, SageMaker will use a default bucket. Next we have to define the Deep Learning framework we want to use and the hyperparameters.

It can be useful for debugging purposes to define the instance type as local in the beginning. Then the SageMaker will execute your code in your local Notebook instance. 

In [None]:
from sagemaker.mxnet import MXNet

MY_S3_BUCKET = ''

mxnet_estimator = MXNet('train.py',
                        role=role,
                        train_instance_type='local',#'ml.m5.xlarge',
                        train_instance_count=1,
                        output_path='s3://MY_S3_BUCKET',
                        code_location='s3://MY_S3_BUCKET',
                        framework_version='1.3.0', py_version='py2',
                        hyperparameters={'batch_size': 16,
                         'epochs': 10,
                         'learning_rate': 0.0001,
                         'wd': 0.0})


Now we can call fit on our training data. Behind the scenes SageMaker spin up your EC2 instance indicated in ```train_instance_type``` (if not set to local). Once the instance is ready SageMaker will download a MXNet Docker container, and execute the function ```train()``` from ```train.py```, which creates and trains the model. After training the model is saved.

In [None]:

mxnet_estimator.fit({'train': 's3://MY_S3_BUCKET/data/input_data.npy'})

Once our autoencoder model is trained we can deploy it. Here we define that the endpoint shall run on a ```m5.xlarge``` instance, which does not provide any GPUs. Inference won't run very fast, but this instance type is therefore cheaper.

In [None]:
predictor = mxnet_estimator.deploy(instance_type='ml.m5.xlarge', initial_instance_count=1)

Now that the endpoint is ready, we can send requests to it. SageMaker provides standard code for model inference. But often it is useful to customize these functions, for this reason ```train.py``` overwrites the default ```model_fn```. In the following example we send a numpy array filled with zeros to the endpoint. The endpoint will verify the user request, parse the input, then load the model and return the inference results.

In [None]:
files = sorted(glob.glob('UCSD_Anomaly_Dataset.v1p2/UCSDped1/Test/*/*'))

im = Image.open(files[0])
im = im.resize((100,100))
test_image = np.array(im, dtype=np.float32)/255.0

In [None]:
from sagemaker.predictor import numpy_deserializer, npy_serializer
import numpy as np

predictor.accept = 'application/x-npy'
predictor.content_type = 'application/x-npy'
predictor.deserializer =  numpy_deserializer
predictor.serializer =  npy_serializer
print(predictor.predict(a))