# Deploying a Model Using Amazon SageMaker

## Project

This is going to be my first end to end machine learning project. I will pre-process the data and develop a model, as I did in every of my previous ML projects. But this time I will also deploy the model to the web, so it can be used by people. The model and the web application are going to be simple as my focus is mainly on doing it end to end using AWS and Amazon Sagemaker.

The end result is going to be a simple web app that, given movie review, will tell you if the person likes the movie or not. I stick with a basic RNN for the modelling part.

You can treat is as a tutorial and just read it or even execute the project on your own, by uploading the code to SageMaker and following this notebook step by step. I covered here all the steps. So if you want, open it on AWS and let's start.

#### Opening notebook instance in Amazon SageMaker

The notebook uses `ml.m4.xlarge` instance. You need to make sure that your limit is set to at least `1`. If not, create a case in AWS support to increase the limit.

Then you can go to 'Amazon SageMaker' service, from which you choose 'Create notebook instance'. In the creator you need to configure a few settings. For name, choose whatever you want. Notebook instance type can be left as a default `ml.t2.medium`. Under role choose 'Create a new role'. Then just select 'None' under 'S3 buckets you specify' and create a role. Now you can scroll to the bottom and create a notebook instance.

Once it is created, you can open and then open a terminal in it. Next, enter `SageMaker` directory and clone this github repository. Now you have everything set up.

## Outline

The complete machine learning workflow, using jupyter notebook within SageMaker, consists  of the following steps:

1. Download the data.
2. Prepare it for the model.
3. Copy the data to S3.
4. Train the model.
5. Test it (using a batch transform job).
6. Deploy it.
7. Use the deployed model.

I am going to follow these steps with some modifications. In the testing step, I will deploy the model for testing. It will let me to check not only if a pure model works as expected but also if I am deploying it correctly. In the next step, I will deploy it again. But this time integrating it with a web page.

## Step 1: Download the data

I am going to download the [IMDb dataset](http://ai.stanford.edu/~amaas/data/sentiment/) with reviews from the original page. The data is compressed, so then I extract it to a data directory.

In [1]:
%mkdir ./data
!wget -O ./data/aclImdb_v1.tar.gz http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
!tar -zxf ./data/aclImdb_v1.tar.gz -C ./data

mkdir: cannot create directory ‘./data’: File exists
--2020-11-03 22:19:00--  http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
Resolving ai.stanford.edu (ai.stanford.edu)... 171.64.68.10
Connecting to ai.stanford.edu (ai.stanford.edu)|171.64.68.10|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 84125825 (80M) [application/x-gzip]
Saving to: ‘./data/aclImdb_v1.tar.gz’


2020-11-03 22:19:12 (6.83 MB/s) - ‘./data/aclImdb_v1.tar.gz’ saved [84125825/84125825]



## Step 2: Prepare it for the model

The reviews are just text files. They are already divided into directories *test* and *train* and each of them is divided further into *neg* and *pos* subdirectories. What you would like to do next is to read them in memory. I am going to read all reviews into a dataframe so I can keep the information about split and labels.

In [2]:
import os
import glob

def read_imdb_data(data_dir = '../data/aclImdb'):
    data = {}
    labels = {}
    
    for data_type in ['train', 'test']:
        data[data_type] = {}
        labels[data_type] = {}
        
        for sentiment in ['pos', 'neg']:
            data[data_type][sentiment] = []
            labels[data_type][sentiment] = []
            
            path = os.path.join(data_dir, data_type, sentiment, '*.txt')
            files = glob.glob(path)
            
            for f in files:
                with open(f) as review:
                    data[data_type][sentiment].append(review.read())
                    # Represent a positive review by '1' and a negative review by '0'
                    labels[data_type][sentiment].append(1 if sentiment == 'pos' else 0)
                    
            assert len(data[data_type][sentiment]) == len(labels[data_type][sentiment]), \
                    "{}/{} data size does not match labels size".format(data_type, sentiment)
                
    return data, labels

In [3]:
data, labels = read_imdb_data()
print("IMDB reviews: train = {} pos / {} neg, test = {} pos / {} neg".format(
            len(data['train']['pos']), len(data['train']['neg']),
            len(data['test']['pos']), len(data['test']['neg'])))

IMDB reviews: train = 12500 pos / 12500 neg, test = 12500 pos / 12500 neg


What you actually need for training are complete datasets. I will join positive and negative reviews together and shuffle them for each split.

In [4]:
from sklearn.utils import shuffle

def prepare_imdb_data(data, labels):
    """Prepare training and test sets from IMDb movie reviews."""
    
    #Combine positive and negative reviews and labels
    data_train = data['train']['pos'] + data['train']['neg']
    data_test = data['test']['pos'] + data['test']['neg']
    labels_train = labels['train']['pos'] + labels['train']['neg']
    labels_test = labels['test']['pos'] + labels['test']['neg']
    
    #Shuffle reviews and corresponding labels within training and test sets
    data_train, labels_train = shuffle(data_train, labels_train)
    data_test, labels_test = shuffle(data_test, labels_test)
    
    # Return a unified training data, test data, training labels, test labets
    return data_train, data_test, labels_train, labels_test

In [5]:
train_X, test_X, train_y, test_y = prepare_imdb_data(data, labels)
print("IMDb reviews (combined): train = {}, test = {}".format(len(train_X), len(test_X)))

IMDb reviews (combined): train = 25000, test = 25000


Let's do a quick check and see an example of the data the model will be trained on. This is generally a good idea as it allows you to see how each of the further processing steps affects the reviews and it also ensures that the data has been loaded correctly.

I'm going to print 100th review and a corresponding label. 

In [6]:
print(train_X[100])
print(train_y[100])

I had to give this film a 10 simply because it did what so many films thrown at black audiences have FAILED MISERABLY to do. This film was void of all the video whore clichés, and it also skipped the gangsters, rappers, and foul language. It examined several relationships among physicians, African Americans, and African American male/female romantic relationships on a completely new and refreshing level. I was highly impressed with the films careful mix of light headed humor with some pretty tough and heavy issues. The film will leave you feeling happy and sad all at the same time. I saw it at the Boston Film Festival as well. It premiered at AMC Loews Boston and I truly hope this film makes it to much larger audiences. Black People (and EVERYONE else) would LOVE a movie like this- if only the industry was SMART enough to put them out there!<br /><br />As an extra- if your a doctor, a resident, a medical student, a premedical student, married to a doctor, have a doctor sibling, have a 

The result looks as expected. There is a single review and a single label. 1 means that the review is positive and if you read it, it actually is.

When you read the review, you can notice that it contains some html tags like `<br />`. The first step in processing the data will be to remove them. In addition, you wish to tokenize the vocabulary. It simply means that words such as *entertained* and *entertaining* are brought to a common form like *entertain*. Thanks to this, they will be considered the same during analysis. And they should be as the word ending brings no additional information about the text's sentiment.

In [7]:
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import *

import re
from bs4 import BeautifulSoup

def review_to_words(review):
    nltk.download('stopwords', quiet = True)
    stemmer = PorterStemmer()
    
    text = BeautifulSoup(review, 'html.parser').get_text() # Remove HTML tags
    text = re.sub(r'[^a-zA-Z0-9]', ' ', text.lower()) # Convert to lower case
    words = text.split() # Split string into words
    words = [w for w in words if w not in stopwords.words('english')] # Remove stopwords
    words = [PorterStemmer().stem(w) for w in words] # stem
    
    return words

The `review_to_words` method defined above uses `BeautifulSoup` to remove any html tags that appear and uses the `nltk` package to tokenize the reviews. Let's test it and apply it to a single, 100th review.

In [8]:
review_to_words(train_X[100])

['give',
 'film',
 '10',
 'simpli',
 'mani',
 'film',
 'thrown',
 'black',
 'audienc',
 'fail',
 'miser',
 'film',
 'void',
 'video',
 'whore',
 'clich',
 'also',
 'skip',
 'gangster',
 'rapper',
 'foul',
 'languag',
 'examin',
 'sever',
 'relationship',
 'among',
 'physician',
 'african',
 'american',
 'african',
 'american',
 'male',
 'femal',
 'romant',
 'relationship',
 'complet',
 'new',
 'refresh',
 'level',
 'highli',
 'impress',
 'film',
 'care',
 'mix',
 'light',
 'head',
 'humor',
 'pretti',
 'tough',
 'heavi',
 'issu',
 'film',
 'leav',
 'feel',
 'happi',
 'sad',
 'time',
 'saw',
 'boston',
 'film',
 'festiv',
 'well',
 'premier',
 'amc',
 'loew',
 'boston',
 'truli',
 'hope',
 'film',
 'make',
 'much',
 'larger',
 'audienc',
 'black',
 'peopl',
 'everyon',
 'els',
 'would',
 'love',
 'movi',
 'like',
 'industri',
 'smart',
 'enough',
 'put',
 'extra',
 'doctor',
 'resid',
 'medic',
 'student',
 'premed',
 'student',
 'marri',
 'doctor',
 'doctor',
 'sibl',
 'doctor',
 'fami

By looking at this example, what does the method actually do? I'm going to bring everything together.

It converts text to lowercase, transforms it into a list of words, removes stopwords (the most common words that don't carry much information on their own). It also removes punctuation from text so you don't analyze separately words like *clay* and *clay,* (with a comma). You can also see an example of the tokenization. Word *experi* represents the word *experience* from the review you saw before.

Now that it is clear how it works, I'm going to apply it to the data. I will also save the result to a file. This operation can take a while. Next time, you can always read it from cache without needing to recode the text again.

In [9]:
import pickle

cache_dir = os.path.join('./cache', 'sentiment_analysis')  # Where to store cache files
os.makedirs(cache_dir, exist_ok = True)  # Ensure cache directory exists

def preprocess_data(data_train, 
                    data_test, 
                    labels_train, 
                    labels_test,
                    cache_dir=  cache_dir, 
                    cache_file = 'preprocessed_data.pkl'):
    '''Convert each review to words. Read from cache if available.'''

    # If cache_file is not None, try to read from it first
    cache_data = None
    if cache_file is not None:
        try:
            with open(os.path.join(cache_dir, cache_file), 'rb') as f:
                cache_data = pickle.load(f)
            print('Read preprocessed data from cache file:', cache_file)
        except:
            pass  # Unable to read from cache, but that's okay
    
    # If cache is missing, then do the heavy lifting
    if cache_data is None:
        # Preprocess training and test data to obtain words for each review
        # words_train = list(map(review_to_words, data_train))
        # words_test = list(map(review_to_words, data_test))
        words_train = [review_to_words(review) for review in data_train]
        words_test = [review_to_words(review) for review in data_test]
        
        # Write to cache file for future runs
        if cache_file is not None:
            cache_data = dict(words_train = words_train, 
                              words_test = words_test,
                              labels_train = labels_train, 
                              labels_test = labels_test)
            with open(os.path.join(cache_dir, cache_file), 'wb') as f:
                pickle.dump(cache_data, f)
            print('Wrote preprocessed data to cache file:', cache_file)
    else:
        # Unpack data loaded from cache file
        words_train, words_test, labels_train, labels_test = (cache_data['words_train'], 
                                                              cache_data['words_test'], 
                                                              cache_data['labels_train'], 
                                                              cache_data['labels_test'])
    
    return words_train, words_test, labels_train, labels_test

In [10]:
# Preprocess data
train_X, test_X, train_y, test_y = preprocess_data(train_X, test_X, train_y, test_y)

Read preprocessed data from cache file: preprocessed_data.pkl


## Transform the data

For the model I am going to build, I will construct a feature representation which is very similar to a bag-of words model. To start, I will represent each word as an integer. Of course, some of the words that appear in the reviews occur very infrequently and so likely don't contain much information for the purposes of sentiment analysis. The way you can deal with this problem is to fix the size of the working vocabulary and only include the words that appear most frequently. Then you will be able to combine all of the infrequent words into a single category and, in this case, I will label it as `1`.

Since I will be using a recurrent neural network, it will be convenient if the length of each review is the same. To do this, I will fix a size for reviews and then pad short reviews with the category 'no word' (which I will label `0`) and truncate long reviews.

### Create a word dictionary

To begin with, you need to find a way to map words that appear in the reviews to integers. Here I am going to fix the size of the vocabulary (including the 'no word' and 'infrequent' categories) to be 5000 but you may wish to change this to see how it affects the model.

In [11]:
from collections import Counter

import numpy as np

def build_dict(data, vocab_size = 5000):
    '''Construct and return a dictionary mapping each of the most frequently appearing words to a unique integer'''

    # Determine how often each word appears in `data`
    # `data` is a list of sentences and sentence is a list of words
    words = []
    for sentence in data:
        for word in sentence:
            words.append(word)
    
    # A dict storing the words that appear in the reviews along with how often they occur
    word_count = dict(Counter(words))
    
    # Sort the words found in `data` so that sorted_words[0] is the most frequently appearing word and
    # sorted_words[-1] is the least frequently appearing word.
    sorted_words = sorted(word_count, key = word_count.get, reverse = True)
    
    word_dict = {} # This is going to be the end result, a dictionary that translates words into integers
    for idx, word in enumerate(sorted_words[:vocab_size - 2]): # The -2 is so that there is a room for the 
                                                               # 'no word' and 'infrequent' labels
    return word_dict

In [12]:
word_dict = build_dict(train_X)

Let's use the result to find out what are the five most frequent words.

In [13]:
from itertools import islice

list(islice(word_dict, 5))

['movi', 'film', 'one', 'like', 'time']

By looking at these words, you can tell that it makes sense that they appear the most frequently in movie reviews.

### Save `word_dict`

Later on, the `word_dict` will be needed to encode the incoming reviews. User will send to the app a text, which needs to be transformed before being passed to the model to get a prediction. So I'm going to save it for future use.

In [14]:
data_dir = './data/pytorch' # The folder used for storing data
if not os.path.exists(data_dir): # Make sure that the folder exists
    os.makedirs(data_dir)

In [15]:
with open(os.path.join(data_dir, 'word_dict.pkl'), 'wb') as f:
    pickle.dump(word_dict, f)

### Transform the reviews

Now I can use the dictionary to transform the words appearing in the reviews into integers. I will make sure to pad or truncate each one to a fixed length of 500.

In [16]:
def convert_and_pad(word_dict, sentence, pad = 500):
    NOWORD = 0 # Use 0 to represent the 'no word' category
    INFREQ = 1 # Use 1 to represent the infrequent words, i.e., words not appearing in word_dict
    
    working_sentence = [NOWORD] * pad
    
    for word_index, word in enumerate(sentence[:pad]):
        if word in word_dict:
            working_sentence[word_index] = word_dict[word]
        else:
            working_sentence[word_index] = INFREQ
            
    return working_sentence, min(len(sentence), pad)

def convert_and_pad_data(word_dict, data, pad = 500):
    result = []
    lengths = []
    
    for sentence in data:
        converted, leng = convert_and_pad(word_dict, sentence, pad)
        result.append(converted)
        lengths.append(leng)
        
    return np.array(result), np.array(lengths)

In [17]:
train_X, train_X_len = convert_and_pad_data(word_dict, train_X)
test_X, test_X_len = convert_and_pad_data(word_dict, test_X)

As a quick check, let's take a look at a one of transformed reviews.

In [18]:
print(train_X[100])
print('\nlength', len(train_X[100]))

[1052    1   90  172  849  102    3 4254    1 1062  476 2408 4828  897
    1  405  811 1333  458   86  530  348 3739 3713 1688 4513    1 4180
 1013  483 1072  279    1 1183  887    3  564  120 4919 1550    1  476
  367 1062  476   28 1315  580    1  302  896  410   88  752  302    3
  302   33    5    1    1    3 1063    1 1268  741  312 4977  917  520
  191  475   38  549  121 1866  880  106    1  545  113  948  635   83
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0 

The output looks reasonable. Most of the text is padded with `0` so it is rather a short review. Majority of the words seem to belong to a frequent category, while there is a few that belong to the infrequent category of `1`. It all makes sense. And finally, you can see that it keeps the desired length of 500.

So the text above is processed using two functions, `preprocess_data` and `convert_and_pad_data`. Let's analyze what they actually do. `preprocess_data` applies `review_to_words` if it is run for the first time. In other case, it just reads the previous processing result from cache. And you could read what `review_to_works` does before. The only problem that I can see with this solution is that you can change your preprocessing code but without clearing the cache you won't see the results as it will still load the previous cache. So you need to keep this in mind.

`convert_and_pad_data` does exactly what you can see above. It converts every word in a review to an integer. And there are two special ints. `0` used for padding and `1` for 'infrequent word' which simply appears less frequently than the top 4998 words. The potential disadvantage of this solution is that you process only 500 words from each review. In case of a longer review, you will loose some information as only first 500 words from it will be processed. All in all, I believe it is an optimization, as most reviews will be shorter than that. And you won't need to keep in memory all the additional `0` that would be used for padding.

## Step 3: Copy the data to S3

At now I was playing with data locally in the notebook. But SageMaker runs on AWS and it won't have access to any dataset until you upload to the cloud. The data is prepared now, so I will upload it to S3 instance.

### Save the dataset locally

I am going to save it locally first. Each row will start with `label`, `length` and then will come 500 integer-words.

In [19]:
import pandas as pd
    
pd.concat([pd.DataFrame(train_y), pd.DataFrame(train_X_len), pd.DataFrame(train_X)], axis = 1) \
        .to_csv(os.path.join(data_dir, 'train.csv'), header = False, index = False)

### Upload to the cloud

Next, I can save it to the S3 bucket. I am just going to upload the entire `data` directory. It means that I will store not only the training set but also the word dictionary `word_dict.pkl`. It will come in handy later, when you will want to predict the sentiment on new data. The app will use it to process the input.

In [20]:
import sagemaker

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()
prefix = 'sagemaker/sentiment_rnn'

role = sagemaker.get_execution_role()

In [21]:
input_data = sagemaker_session.upload_data(path=data_dir, bucket=bucket, key_prefix=prefix)

## Step 4: Train the model

The SageMaker model is made of three objects:

 - model artifacts,
 - training code,
 - inference code.
 
Model artifacts result from training and they are objects like model weights. Code for training and inference, same as with data, needs to be packed into containers and uploaded to AWS.

You can find the training code in the `train` directory. Let's take a look at `model.py`. It defines the neural network in PyTorch.

In [22]:
!pygmentize train/model.py

[34mimport[39;49;00m [04m[36mtorch[39;49;00m[04m[36m.[39;49;00m[04m[36mnn[39;49;00m [34mas[39;49;00m [04m[36mnn[39;49;00m

[34mclass[39;49;00m [04m[32mLSTMClassifier[39;49;00m(nn.Module):
    [33m'''This is the simple RNN model for Sentiment Analysis'''[39;49;00m

    [34mdef[39;49;00m [32m__init__[39;49;00m([36mself[39;49;00m, embedding_dim, hidden_dim, vocab_size):
        [33m'''Initialize the model and choose its parameters'''[39;49;00m
        [36msuper[39;49;00m(LSTMClassifier, [36mself[39;49;00m).[32m__init__[39;49;00m()

        [36mself[39;49;00m.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=[34m0[39;49;00m)
        [36mself[39;49;00m.lstm = nn.LSTM(embedding_dim, hidden_dim)
        [36mself[39;49;00m.dense = nn.Linear(in_features=hidden_dim, out_features=[34m1[39;49;00m)
        [36mself[39;49;00m.sig = nn.Sigmoid()
        
        [36mself[39;49;00m.word_dict = [34mNone[39;49;00m

    [

The important takeaway from the code is that there are three parameters that you may wish to fine tune later. I made them available in the constructor, so you can do it easily. They are the embedding dim, hidden dim and vocab size.

I'm going to play with the code from `train.py` here. I want to test it before running SageMaker on a small set of only 250 reviews. I will load them first.

In [23]:
import torch
import torch.utils.data

# Read in only the first 250 rows
train_sample = pd.read_csv(os.path.join(data_dir, 'train.csv'), header = None, names = None, nrows = 250)

# Turn the input pandas dataframe into tensors
train_sample_y = torch.from_numpy(train_sample[[0]].values).float().squeeze()
train_sample_X = torch.from_numpy(train_sample.drop([0], axis=1).values).long()

# Build the dataset
train_sample_ds = torch.utils.data.TensorDataset(train_sample_X, train_sample_y)
# Build the dataloader
train_sample_dl = torch.utils.data.DataLoader(train_sample_ds, batch_size = 50)

### Training method

And now the code of training method.

In [24]:
def train(model, train_loader, epochs, optimizer, loss_fn, device):
    for epoch in range(1, epochs + 1):
        model.train()
        total_loss = 0
        
        for batch in train_loader:         
            batch_X, batch_y = batch
            
            batch_X = batch_X.to(device)
            batch_y = batch_y.to(device)
            
            # Clear the gradients
            optimizer.zero_grad()
            # Forward pass
            output = model.forward(batch_X)
            # Calculate the batch loss
            loss = loss_fn(output, batch_y)
            # Backward pass
            loss.backward(retain_graph = True)
            # Parameter update
            optimizer.step()
            
            total_loss += loss.data.item()
            
        print("Epoch: {}, BCELoss: {}".format(epoch, total_loss / len(train_loader)))

Let's run it on the small dataset.

In [25]:
import torch.optim as optim
from train.model import LSTMClassifier

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = LSTMClassifier(32, 100, 5000).to(device)
optimizer = optim.Adam(model.parameters())
loss_fn = torch.nn.BCELoss()

train(model, train_sample_dl, 5, optimizer, loss_fn, device)

Epoch: 1, BCELoss: 0.6937276005744935
Epoch: 2, BCELoss: 0.6814779281616211
Epoch: 3, BCELoss: 0.6711473226547241
Epoch: 4, BCELoss: 0.6603154182434082
Epoch: 5, BCELoss: 0.6475648880004883


By looking at the results, it works perfectly.

The last file that you can see is `requirements.txt`. It will be used to recreate the environment. You want to include there names of used packages like `numpy` and `nltk`. Now the training directory is complete and can be used to create a container.

### Train the model

The SageMaker will execute the `train` package using the main method. I employ there an argument parser, because SageMaker passes hyperparameters this way.

Now I'm going to use all that code and train the actual model. You can notice below that is just enough to pass to SageMaker the `train` directory and it will create a container itself.

In [26]:
from sagemaker.pytorch import PyTorch

estimator = PyTorch(entry_point = 'train.py',
                    source_dir = 'train',
                    role = role,
                    framework_version = '0.4.0',
                    train_instance_count = 1,
                    # train_instance_type='ml.p2.xlarge',
                    # I use  this instance instead because my limit is 0 and I don't want to wait
                    train_instance_type = 'ml.m4.xlarge',
                    hyperparameters = {
                        'epochs': 10,
                        'hidden_dim': 200,
                    })

In [27]:
estimator.fit({'training': input_data})

'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


2020-11-04 01:21:28 Starting - Starting the training job...
2020-11-04 01:21:30 Starting - Launching requested ML instances......
2020-11-04 01:22:54 Starting - Preparing the instances for training......
2020-11-04 01:23:40 Downloading - Downloading input data...
2020-11-04 01:24:23 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2020-11-04 01:24:24,269 sagemaker-containers INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2020-11-04 01:24:24,272 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-11-04 01:24:24,286 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2020-11-04 01:24:24,290 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2020-11-04 01:24:24,500 sagemaker-containers INFO    

[34mEpoch: 1, BCELoss: 0.675865929953906[0m
[34mEpoch: 2, BCELoss: 0.6087629515297559[0m
[34mEpoch: 3, BCELoss: 0.5433617891097555[0m
[34mEpoch: 4, BCELoss: 0.4614679637003918[0m
[34mEpoch: 5, BCELoss: 0.4009826846268712[0m
[34mEpoch: 6, BCELoss: 0.3666040392554536[0m
[34mEpoch: 7, BCELoss: 0.35790663653490495[0m
[34mEpoch: 8, BCELoss: 0.3162726644350558[0m
[34mEpoch: 9, BCELoss: 0.2921612065057365[0m

2020-11-04 03:04:03 Uploading - Uploading generated training model[34mEpoch: 10, BCELoss: 0.28056296128399516[0m
[34m2020-11-04 03:03:59,558 sagemaker-containers INFO     Reporting training SUCCESS[0m

2020-11-04 03:04:09 Completed - Training job completed
Training seconds: 6029
Billable seconds: 6029


## Step 5: Test the model

As I said before, I will test the model two times, deploying it first.

## Step 6: Deploy the model for testing

To recap, the model takes as input vectors containing `review_length, review[500]` where `review[500]` is a sequence of `500` integer-words encoded with `word_dict`. SageMaker has in-built `predictor.predict()` function for models with simple input as this one. 

The only thing you need to provide is a function that loads the trained model. It must be called `model_fn()` and have only one parameter, a path to directory with the model. It also needs to be contained in the file that was used as `entry-point`. From the code above, you can see that I used `train.py`. If you want to take a look at my `model_fn()`, just go to this file.

But first, let's deploy the model. After running the code below, SageMaker will launch a compute instance with it that can be accessed by using an endpoint.

In [28]:
# Deploy the trained model
predictor = estimator.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge')

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


-------------!

## Step 7: Use the model for testing

Now, you can send data to it and collect the predictions. I'm going to combine into a dataframe the `test_X` with info about its length. I will use this information in my local `predict` function that splits the dataset into an array of smaller arrays and then gathers results back. This trick will make sending data more efficient.

In [29]:
test_X = pd.concat([pd.DataFrame(test_X_len), pd.DataFrame(test_X)], axis = 1)

In [30]:
# Split the data into chunks and send each chunk seperately, accumulating the results

def predict(data, rows=512):
    split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
    predictions = np.array([])
    for array in split_array:
        predictions = np.append(predictions, predictor.predict(array))
    
    return predictions

In [31]:
predictions = predict(test_X.values)
predictions = [round(num) for num in predictions]

In [32]:
from sklearn.metrics import accuracy_score
accuracy_score(test_y, predictions)

0.85192

The accuracy above 85% looks decent. And if you look at the training output, the model hasn't converged after the 10 training epochs it was trained for. It means that it can improve even further.

### More testing

If you think about it, it would also be useful to see how the model performs with a previously unseen review. Ultimately, I would like to be able to send reviews in form of strings and get predictions for them. I am going to create a one for testing.

In [33]:
test_review = 'The simplest pleasures in life are the best, and this film is one of them. Combining a rather basic storyline of love and adventure this movie transcends the usual weekend fair with wit and unmitigated charm.'

But the model can't directly work with text, it needs to preprocess and encode it as integers first. All you need to do is apply functions `review_to_words` and `convert_and_pad` that you saw before. Then just add the review length so you have `review_length, review[500]`.

In [34]:
# Convert test_review into a form usable by the model and save the results in test_data
review_words, length = convert_and_pad(word_dict, review_to_words(test_review))
# create np array of the form [[length, review_words[0], review_words[1], ...]]
test_data = np.array([np.array([length] + review_words)])

The model can process the review now.

In [35]:
predictor.predict(test_data)

array(0.8777149, dtype=float32)

Since the return value is close to `1`, it means that the prediction is 'positive'. That's right.

### Delete the endpoint

The model passed all the tests. I can shut down the endpoint now.

In [36]:
estimator.delete_endpoint()

estimator.delete_endpoint() will be deprecated in SageMaker Python SDK v2. Please use the delete_endpoint() function on your predictor instead.


## Step 6 (again): Deploy the model for the web app

As you could see, by default the model works if you pass to it preprocessed data. But it would be useful if it could somehow preprocess the review by itself. Then you could just send a raw review and receive a prediction.

This is what inference code is meant to do. I am going to store it in the `serve` directory. SageMaker will then make a container out of it and use it when making predictions. It will contain a few files:

- `model.py` having the same model code as in `train`,
- `utils.py` containing `review_to_words` and `convert_and_pad` functions that will be used for pre-processing,
- `predict.py` where I will put inference code,
- `requirements.txt` that will tell what Python libraries needs to be provided so the code will work.

I already have code for all the files but `predict.py`. What exactly does it need to contain? SageMaker expects it to have 4 methods:

 - `model_fn`: loads the model, the same as in `train`,
 - `input_fn`: takes the raw input and makes it available for the inference code,
 - `output_fn`: transforms the results before sending them back to the caller,
 - `predict_fn`: takes the input, does preprocessing, passes it to the model and returns predictions.

For the simple app that I am building here, `input_fn` and `output_fn` don't need to do much. The former is going to decode text from `utf-8` format used by a webpage so it is a plain text. The latter is just going to return the model output. It would take much more work to deserialize and serialize the image data though.

### Write the inference code

Let's complete the `predict.py` and write code for the four functions.

In [37]:
!pygmentize serve/predict.py

[34mimport[39;49;00m [04m[36margparse[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m
[34mimport[39;49;00m [04m[36mpickle[39;49;00m
[34mimport[39;49;00m [04m[36msys[39;49;00m
[34mimport[39;49;00m [04m[36msagemaker_containers[39;49;00m
[34mimport[39;49;00m [04m[36mpandas[39;49;00m [34mas[39;49;00m [04m[36mpd[39;49;00m
[34mimport[39;49;00m [04m[36mnumpy[39;49;00m [34mas[39;49;00m [04m[36mnp[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m[04m[36m.[39;49;00m[04m[36mnn[39;49;00m [34mas[39;49;00m [04m[36mnn[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m[04m[36m.[39;49;00m[04m[36moptim[39;49;00m [34mas[39;49;00m [04m[36moptim[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m[04m[36m.[39;49;00m[04m[36mutils[39;49;00m[04m[36m.[39;49;00m[04m[36mdata[39;49;00m

[34mfrom

### Deploying the model

The inference code is ready and I can deploy the model. But first, I need to construct a new PyTorchModel and make sure that it points to right elements. These are the model artifacts that result from training, path to directory with inference code and its entry point being `predict.py`.

By default, a deployed PyTorch model assumes that any input takes the form of a `numpy` array. In this case, I want the container to take in review strings, so I wrap the predictor class with `StringPredictor`.

In [38]:
from sagemaker.predictor import RealTimePredictor
from sagemaker.pytorch import PyTorchModel

class StringPredictor(RealTimePredictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super(StringPredictor, self).__init__(endpoint_name, sagemaker_session, content_type = 'text/plain')

model = PyTorchModel(model_data = estimator.model_data,
                     role = role,
                     framework_version = '0.4.0',
                     entry_point = 'predict.py',
                     source_dir = 'serve',
                     predictor_cls = StringPredictor)
predictor = model.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge')

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


--------------!

### Testing the model

Let's test the deployed model by sending to it 250 positive and 250 negative reviews. I won't send the entire testing set though. The model processes review one by one and both the input and output need to be sent through the Internet, so it would take quite a long time.

In [39]:
import glob

def test_reviews(data_dir='../data/aclImdb', stop = 250):
    
    results = []
    ground = []
    
    # Test both positive and negative reviews    
    for sentiment in ['pos', 'neg']:
        
        path = os.path.join(data_dir, 'test', sentiment, '*.txt')
        files = glob.glob(path)
        
        files_read = 0
        
        print('Starting ', sentiment, ' files')
        
        # Iterate through the files and send them to the predictor
        for f in files:
            with open(f) as review:
                # First store the true label
                if sentiment == 'pos':
                    ground.append(1)
                else:
                    ground.append(0)
                # Read in the review and convert to 'utf-8' for transmission via HTTP
                review_input = review.read().encode('utf-8')
                # Send the review to the predictor and store the results
                results.append(int(predictor.predict(review_input)))
                
            # Sending reviews to the endpoint one at a time takes a while
            # So stop after reaching 'stop' value
            files_read += 1
            if files_read == stop:
                break
            
    return ground, results

In [40]:
ground, results = test_reviews()

Starting  pos  files
Starting  neg  files


In [41]:
from sklearn.metrics import accuracy_score
accuracy_score(ground, results)

0.838

The result is more less the same as on the entire testing set. Some difference is expected, because the subset can for example contain more difficult inputs on average compared to the whole set.

As an additional test, you can send the `test_review` that I used earlier.

In [42]:
predictor.predict(test_review)

b'1'

It again predicts the correct, positive sentiment.

## Step 7 (again): Use the model for the web app

All the time I have been interacting with the deployed model from this notebook which has a proper IAM role allowing for seamless communication. If I wanted to communicate with the model from a webpage, the situation would be different. In order to access the SageMaker, the app would have to authenticate with AWS every time. However, I can still enable seamless communication between the app and the SageMaker using AWS Lambda.

<img src="images/web_app_diagram.svg">

You can see the architecture of this solution on the diagram above. The main point is the Lambda. You can think of it as a python function that is executed reactively. What I mean by this is that it will have its own endpoint, a url created using API Gateway. Once the app sends data to this endpoint, it will trigger the Lambda. The Lambda, exisiting on AWS, will directly access the model, ask it for prediction and then send it back to the app. Actually, the Lambda acts as an intermediary between the app and the model and it can access it without authentication.

### Set up a Lambda function

#### Part A: Create an IAM Role for the Lambda function

Since you want the Lambda function to call a SageMaker endpoint, it needs a proper IAM role to do so. IAM role is the same as permission. If you want to create it on your own, you can follow the steps below.

Using the AWS Console, navigate to the **IAM** page and click on **Roles**. Then, click on **Create role**. Make sure that the **AWS service** is the type of trusted entity selected and choose **Lambda** as the service that will use this role, then click **Next: Permissions**.

In the search box type `sagemaker` and select the check box next to the **AmazonSageMakerFullAccess** policy. Then, click on **Next: Review**.

Lastly, give this role a name. Make sure you use a name that you will remember later on, for example `LambdaSageMakerRole`. Then, click on **Create role**.

#### Part B: Create a Lambda function

Now it is time to actually create the Lambda function. Again, if you want to do it yourself, instructions are below.

Using the AWS Console, navigate to the AWS Lambda page and click on **Create a function**. When you get to the next page, make sure that **Author from scratch** is selected. Now, name your Lambda function, using a name that you will remember later on, for example `sentiment_analysis_func`. Make sure that the **Python 3.6** runtime is selected and then choose the role that you created in the previous part. Then, click on **Create Function**.

On the next page you will see some information about the Lambda function you've just created. If you scroll down you should see an editor in which you can write the code that will be executed when your Lambda function is triggered. For this project, I will use the code below.

```python
# Use the low-level library to interact with SageMaker since the SageMaker API
# is not available natively through Lambda.
import boto3

def lambda_handler(event, context):

    # To invoke the endpoint, you need the SageMaker runtime
    runtime = boto3.Session().client('sagemaker-runtime')

    # Use that runtime to invoke the model's endpoint, by sending the review received from the app
    response = runtime.invoke_endpoint(EndpointName = '**ENDPOINT NAME HERE**',    # Name of the created endpoint
                                       ContentType = 'text/plain',                 # The data format expected by the endpoint
                                       Body = event['body'])                       # The actual review

    # The result is an HTTP response, the body contains the prediction
    result = response['Body'].read().decode('utf-8')

    return {
        'statusCode' : 200,
        'headers' : { 'Content-Type' : 'text/plain', 'Access-Control-Allow-Origin' : '*' },
        'body' : result
    }
```

Once you have copy and pasted the code above into the Lambda code editor, replace the `**ENDPOINT NAME HERE**` portion with the name of the model's endpoint. You can get it by running the code below.

In [43]:
predictor.endpoint

'sagemaker-pytorch-2020-11-04-04-24-19-257'

Once you have added the endpoint name to the Lambda function, click on **Save**. Your Lambda function is now up and running. 

### Set up API Gateway

Now that the Lambda function is set up, it's time to create an endpoint for it using API Gateway. This url will be used to trigger the Lambda.

Using AWS Console, navigate to **Amazon API Gateway** and then click on **Get started**.

On the next page, make sure that **New API** is selected and give the new api a name, for example, `sentiment_analysis_api`. Then, click on **Create API**.

Now you created an API, however it doesn't currently do anything. What you want it to do is to trigger the Lambda function that you created earlier.

Select the **Actions** dropdown menu and click **Create Method**. A new blank method will be created, select its dropdown menu and select **POST**, then click on the check mark beside it.

For the integration point, make sure that **Lambda Function** is selected and click on the **Use Lambda Proxy integration**. This option makes sure that the data that is sent to the API is then sent directly to the Lambda function with no processing. It also means that the return value must be a proper response object as it will also not be processed by API Gateway.

Type the name of the Lambda function you created earlier into the **Lambda Function** text entry box and then click on **Save**. Click on **OK** in the pop-up box that then appears, giving permission to API Gateway to invoke the Lambda function you created.

The last step in creating the API Gateway is to select the **Actions** dropdown and click on **Deploy API**. You will need to create a new Deployment stage and name it anything you like, for example `prod`.

You have now successfully set up a public API to access your SageMaker model. Make sure to copy or write down the URL provided to invoke your newly created public API as this will be needed in the next step. This URL can be found at the top of the page, highlighted in blue next to the text **Invoke URL**.

## Step 4: Deploy the web app

Now that I have API, Lambda and the model up and running, the last step is to prepare a webpage. I am going to use a simple static html file. It will communicate with the model through the publicly available API endpoint.

In the `website` folder, there is a file called `index.html`. All you need to do is download it on your computer  and open it in a text editor. Next, find the comment **\*\*REPLACE WITH PUBLIC API URL\*\*** and replace the url that I use below it with your url from the previous step.

Now, if you open this file in your browser, it will behave as a local web server. You can use it to interact with your SageMaker model. You can play with it for a while, but remember to shut down the endpoint when you finish. Amazon will charge you for the time it is running.

I tested the app by submitting a review *Masterpiece. Everyone should see it!*. It predicted postive sentiment, so it works correctly. You can see the screenshot below.

<img src="images/web_app_example.png">

### Delete the endpoint

Remember to always shut down your endpoint if you are no longer using it. You are charged for the length of time that the endpoint is running so if you forget and leave it on you could end up with an unexpectedly large bill.

In [44]:
predictor.delete_endpoint()