## Deploy NetFlow Network Intrusion Detection System Model Package from AWS Marketplace 
---
### Product Overview: 
This product is capable of detecting network attacks within a NetFlow network protocol using machine learning techniques. The ML multiclass trained model can detect 4 main network traffic categories: Benign, Brute Force, DoS & DDoS.

**Detection categories:**
- Benign: If the inference output is *Benign* that means that the input NetFlow record does not belongs to any attack class, in other words, is normal traffic.
- Brute Force: If the inference output from an input Netflow record is *Brute Force*, that record belongs to possible Brute Force attack.
- DoS: If the inference output from an input Netflow record is *DoS*, that record belongs to a possible Denial of Service attack.
- DDoS: If the inference output from an input Netflow record is *DDoS*, that record belongs to a possible Distributed Denial of Service attack.

**Expected input format:** The Endpoint must receive any of the 2 allowed formats: CSV & JSON. The input data must not contain the data header (field names)

**Requiered input data fields:** The Machine Learning model was trained with all the Cisco Netflow V5 fields, therefore to perform an inference the same fields need to be fed to the model.

The fields must be in the following order: 'srcaddr', 'dstaddr', 'nexthop',
                           'input', 'output', 'dPkts',
                           'dOctets', 'first', 'last',
                           'srcport', 'dstport',
                           'tcp_flags', 'prot', 'tos',
                           'src_as', 'dst_as', 'src_mask',
                           'dst_mask'.
                           
More information about the description and meaning of each field in the following link: https://www.ibm.com/docs/en/npi/1.3.0?topic=versions-netflow-v5-formats


**Expected output format:** The inference output received as the endpoint response will be in JSON format and it will contain all the input data plus two new features which are the respective prediction for each record alongside 'confidence' that means the percentage of confidence the model used to output the final prediction between all the possible categories.

In case of a wrong input data format, the model will avoid the inference of the affected network flows, appending an error message at the end of the row. The error message could be:
- “Wrong data type” if the values in each log are not numbers
- “Wrong number of columns” if the number of columns for a given log is different from the allowed 18 NetFlow V5
- “Wrong IP address” if some of the IP addresses are not correctly composed (only IPV4 allowed)
- “Null data” when some field/fields are empty

**Model performance:**
   -  Overall accuracy (binary detection): 84%
   -  Overall recall (binary detection): 84%
   -  Multiclass accuracy (multiclass detection): 78%

---

### Model package deploy example
This sample notebook shows you how to deploy <font color='red'> For Seller to update:[Title_of_your_ML Model](Provide link to your marketplace listing of your product)</font> using Amazon SageMaker.

> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.

#### Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. To deploy this ML model successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to <font color='red'> For Seller to update:[Title_of_your_ML Model](Provide link to your marketplace listing of your product)</font>. If so, skip step: [Subscribe to the model package](#1.-Subscribe-to-the-model-package)

#### Contents:
1. [Subscribe to the model package](#1.-Subscribe-to-the-model-package)
2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)
   1. [Create an endpoint](#A.-Create-an-endpoint)
   2. [Create input payload](#B.-Create-input-payload)
   3. [Perform real-time inference](#C.-Perform-real-time-inference)
   4. [Visualize output](#D.-Visualize-output)
   5. [Delete the endpoint](#E.-Delete-the-endpoint)
3. [Perform batch inference](#3.-Perform-batch-inference) 
4. [Clean-up](#4.-Clean-up)
    1. [Delete the model](#A.-Delete-the-model)
    2. [Unsubscribe to the listing (optional)](#B.-Unsubscribe-to-the-listing-(optional))
    

#### Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

### 1. Subscribe to the model package

To subscribe to the model package:
1. Open the model package listing page <font color='red'> For Seller to update:[Title_of_your_product](Provide link to your marketplace listing of your product).</font>
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

In [1]:
model_package_arn = "arn:aws:sagemaker:us-east-1:452490241637:model-package/netflow-threats-detection"

In [2]:
import base64
import json
import uuid
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage
from urllib.parse import urlparse
import boto3
from IPython.display import Image
from PIL import Image as ImageEdit
import urllib.request
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)

In [3]:
role = get_execution_role()

sagemaker_session = sage.Session()

runtime = boto3.client("runtime.sagemaker")

### 2. Create an endpoint and perform real-time inference

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html).

<font color='red'>For Seller to update: update values for four variables in following cell. 
Specify a model/endpoint name using only alphanumeric characters. </font>

In [4]:
model_name = "netlow-ai"

real_time_inference_instance_type = (
    "ml.t2.medium"
)
batch_transform_inference_instance_type = (
    "ml.m5.large"
)

#real_time_inference_instance_type = "ml.t2.medium"
#batch_transform_inference_instance_type = "ml.m5.large"

#### A. Create an endpoint

In [6]:
# create a deployable model from the model package.
model = ModelPackage(
    role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session
)
# Deploy the model
predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name)

---------!

Once endpoint has been created, you would be able to perform real-time inference.

#### B. Create input payload

- Local variable stored payload

In [None]:
#define Cisco Netflow V5 column names
netflow_v5_column_names = ['srcaddr', 'dstaddr', 'nexthop',
                           'input', 'output', 'dPkts',
                           'dOctets', 'first', 'last',
                           'srcport', 'dstport',
                           'tcp_flags', 'prot', 'tos',
                           'src_as', 'dst_as', 'src_mask',
                           'dst_mask', 'prediction', 'confidence']

In [None]:
#read and show the csv format example data
data = pd.read_csv('data/input/real-time/example-data.csv',
                   names = netflow_v5_column_names[:-2],
                   sep = ',')
data.head()

<Add code snippet that shows the payload contents>

In [None]:
#convert dataframe to csv format again and define data content type
input_data = data.to_csv(header=None, index = None)
content_type = 'text/csv'

- File stored payload

In [7]:
#read data from the example data file then define the data content type. There are 2 options of allowed input data, CSV format or JSON format. Uncomment the option you want to try.
input_data_file = 'data/input/real-time/netflow-v5-real-time-sample.csv'
#input_data_file = 'data/input/real-time/netflow-flows.json'

content_type_file = "text/csv"
#content_type_file = "application/json"

- Output target file

In [9]:
#path where the result will be stored
output_file_name = 'data/output/example-data-out.json'

#### C. Perform real-time inference

A Machine Learning model accepts a payload and returns an inference.

- Inference from local variable. With BOTO3 sdk endpoint

In [None]:
#run inference using the locally stored payload, then print the result in JSON format
response = runtime.invoke_endpoint(
    EndpointName=model_name,
    Body=input_data,
    ContentType=content_type,
)
response_body = json.loads(response['Body'].read().decode('utf-8'))

- Inference from file stored payload. With CLI endpoint

In [10]:
#Run inference using only the path in which the payload is stored. Output is then saved to the output_file_name file.
!aws sagemaker-runtime invoke-endpoint \
    --endpoint-name $model_name \
    --body fileb://$input_data_file \
    --content-type $content_type_file \
    --region $sagemaker_session.boto_region_name \
    $output_file_name

{
    "ContentType": "text/html; charset=utf-8",
    "InvokedProductionVariant": "AllTraffic"
}


#### D. Visualize output

Display the output generated by real-time inference. The final column 'prediction' is the result of the inference alongside 'confidence' which means the percentage of confidence the model used to output the final prediction between all the possible categories.

- From the BOTO3 inference endpoint request:

In [None]:
#visualize endpoint output as a Pandas Dataframe.
pd.DataFrame(response_body['prediction'], columns = netflow_v5_column_names)

- From the CLI inference endpoint request:

In [None]:
#visualize endpoint output as a Pandas Dataframe.
with open(output_file_name, 'r') as j:
     contents = json.loads(j.read())['prediction']
output_data = pd.DataFrame(contents, columns = netflow_v5_column_names)
output_data.head()

#### E. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged.

In [11]:
model.sagemaker_session.delete_endpoint(model_name)
model.sagemaker_session.delete_endpoint_config(model_name)

### 3. Perform batch inference

In this section, you will perform batch inference using multiple input payloads together. If you are not familiar with batch transform, and want to learn more, see these links:
1. [How it works](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-batch-transform.html)
2. [How to run a batch transform job](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html)

Define bucket name and prefix to store the input/output batch data in AWS S3 service.

In [None]:
#define bucket name and prefix to allow batch-type inference. If None, bucket will be created with default name and location.
bucket_name = 'your/bucket/name'
key_prefix = 'your/key/prefix'

#define output path inside defined bucket to store the inference output
output_path = 'your/bucket/name/and/key/prefix'

In [None]:
# path where input batch payload is located
transform_input_folder = "data/input/batch"

# upload the batch-transform job input files to S3
transform_input = sagemaker_session.upload_data(transform_input_folder,
                                                bucket=bucket_name,
                                                key_prefix=key_prefix)
print("Transform input uploaded to " + transform_input)

In [None]:
# Run the batch-transform job
transformer = model.transformer(1,
                                batch_transform_inference_instance_type,
                                output_path = output_path)
transformer.transform(transform_input,
                      content_type=content_type)
transformer.wait()

In [None]:
# output is available on following path
transformer.output_path

The output is in JSON format, view the output file by uncommenting and running following command. Otherwise go to S3, download the file and open it using appropriate editor.

In [None]:
parsed_url = urlparse(transformer.output_path)
bucket_name = parsed_url.netloc
file_key = "{}/{}.out".format(parsed_url.path[1:], 'example-data-1.txt'.split("/")[-1])
print(file_key)
s3_client = sagemaker_session.boto_session.client("s3")

response = s3_client.get_object(Bucket=bucket_name, Key=file_key)

response_bytes = response["Body"].read().decode("utf-8")
print(response_bytes)

### 4. Clean-up

#### A. Delete the model

In [12]:
model.delete_model()

#### B. Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

