# Sentiment Analysis

## Building an API in SageMaker

_Deep Learning Nanodegree Program | Deployment_

---

Now that we've built our own model for sentiment analysis in PyTorch it's time to turn it into something fun! We will be making this model accessible to the outside world and will be accessing it with a simple web app.


## Instructions

Some template code has already been provided for you, and you will need to implement additional functionality to successfully complete this notebook. You will not need to modify the included code beyond what is requested. Sections that begin with '**TODO**' in the header indicate that you need to complete or implement some portion within them. Instructions will be provided for each section and the specifics of the implementation are marked in the code block with a `# TODO: ...` comment. Please be sure to read the instructions carefully!

In addition to implementing code, there will be questions for you to answer which relate to the task and your implementation. Each section where you will answer a question is preceded by a '**Question:**' header. Carefully read each question and provide your answer below the '**Answer:**' header by editing the Markdown cell.

> **Note**: Code and Markdown cells can be executed using the **Shift+Enter** keyboard shortcut. In addition, a cell can be edited by typically clicking it (double-click for Markdown cells) or by pressing **Enter** while it is highlighted.

## Modifying the inference code.

In the previous notebook we constructed a custom model and two different docker containers to manipulate it. The first container we used for training and the second we used for inference. However, we made the assumption when constructing the inference container that the input would be a review described as a seqeunce of integers. However, our goal is to create a simple web app that allows a user to type out a review and then tells the user whether their review is positive or negative. This means we need to modify our inference code to accept a string and then transform the input inside the inference container.

To begin with, let us remind ourselves how we process a review in order to send it off for inference. To begin with, we will read one of the reviews in our test set.

In [None]:
import os

review_text = None
with open(os.path.join('data', 'aclImdb', 'test', 'pos', '10000_7.txt')) as f:
    review_text = f.read()

In [None]:
review_text

Now that we've read a sample review, the first thing we need to do is remove the html tags and stop words.

In [None]:
import nltk
nltk.download("stopwords")
from nltk.corpus import stopwords
from nltk.stem.porter import *
stemmer = PorterStemmer()

In [None]:
import re
from bs4 import BeautifulSoup

def review_to_words(review):
    text = BeautifulSoup(review, "html.parser").get_text() # Remove HTML tags
    text = re.sub(r"[^a-zA-Z0-9]", " ", text.lower()) # Convert to lower case
    words = text.split() # Split string into words
    words = [w for w in words if w not in stopwords.words("english")] # Remove stopwords
    words = [PorterStemmer().stem(w) for w in words] # stem
    
    return words

In [None]:
review_words = review_to_words(review_text)
review_words

And, now that we've converted our review into usable words we need to map those words to integers using the `word_dict` that we created using the training set. We also need to pad or truncate the resulting sequence if it isn't the correct size.

In [None]:
import pickle

data_dir = 'data/pytorch'

word_dict = None
with open(os.path.join(data_dir, 'word_dict.pkl'), "rb") as f:
    word_dict = pickle.load(f)

In [None]:
def convert_and_pad(data, word_dict, pad=500):
    NOWORD = 0 # Use 0 to represent the no word category
    INFREQ = 1 # Use 1 to represent infrequent words
    
    working_sentence = [NOWORD] * pad
    
    # We go through each word in the (possibly truncated) review and convert the words to integers
    for word_index, word in enumerate(data[:pad]):
        if word in word_dict:
            working_sentence[word_index] = word_dict[word]
        else:
            working_sentence[word_index] = INFREQ
            
    return working_sentence, min(len(data), pad)

In [None]:
review_data, review_length = convert_and_pad(review_words, word_dict)

In [None]:
review_length, review_data

And now we have input that can be sent to the neural network that we trained previously. To reiterate, given a review in string form, we need to do the following in order to determine the sentiment of the review:

- Convert the review to words (clean html, remove stopwords, etc.)
- Transform the words to integers using `word_dict` and pad / truncate the sequence
- Send the data through the neural network.

The important takeaway here isn't the additional pre-processing of the input, this is relatively easy to do you just need to modify the code in either `train` or in `model.py` to incorporate this step. Instead, it is important to note that we need to include the `word_dict.pkl` file so that our inference code can make use of it to perform the second item above.

## Step 1: Build and Push new inference code

Now that we know what our inference code needs to do, we can make the necessary changes. The code for this has been provided and resides in the `api_container` folder. In particular, note that the `sentiment_api.py` file contains the code shown above to pre-process incoming data. Of course, in order to do this we need to make sure to include `word_dict.pkl`.

In [None]:
%cp data/pytorch/word_dict.pkl api_container/sentiment/

To recap, the changes between the original inference code and our new inference code are the following:

- `predictor.py` has been modified to pre-process incoming data,
- `sentiment_api.py` has been added, implementing the pre-processing methods,
- `word_dict.pkl` has been added,
- `train` has been removed so that this container can't accidentally be used for training, and
- `Dockerfile.cpu` has been modified so that our code has access to the nltk and BeautifulSoup libraries.

Now that this is done, we can run the `build_and_push.sh` script to make our container available on Amazon's Elastic Container Repository.

In [None]:
%cd api_container
!chmod +x ./build_and_push.sh
!./build_and_push.sh
%cd ..

## Step 2: Test the new inference container

Before getting into the details of setting up the web app we should make sure that our new inference container behaves the way we expect it to. To do this we will deploy and test our new container. Now, the way in which we will do this is a little different from the way that we did it in the previous notebook. This is because we want to use the model artifacts that we created in the previous notebook rather than creating a new model and training it from scratch.

### Creating the endpoint

Of course, we need to know where those model artifacts are stored. In the previous notebook when we trained our model we also recorded the location of the model artifacts. This location needs to be entered below and should begin with `s3://` and end with `model.tar.gz`.

In [None]:
model_artifacts = "s3://sagemaker-us-east-1-337425718252/output/sentiment-pytorch-gpu-2018-06-15-18-57-45-260/output/model.tar.gz"

Now that we know where the model artifacts are stored, we can construct an endpoint using these model artifacts along with the inference container that we've built. To do this we will use the `endpoint_from_model_data()` method provided by the SageMaker Session object. For more details and additional methods provided by the Session object please consult the [SageMaker documentation](http://sagemaker.readthedocs.io)

**Note**: It is important to name the endpoint something that you will remember as it will be required in the Lambda function that we create later to access the inference code.

In [None]:
import sagemaker as sage

sess = sage.Session() # Store the current SageMaker session
role = sage.get_execution_role() # Store our current IAM role

# We will also need our current account number and region in order to completely specify
# the name of the docker container we created earlier
account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name

inference_image = '{}.dkr.ecr.{}.amazonaws.com/sentiment-pytorch-api'.format(account, region)

In [None]:
model_endpoint = sess.endpoint_from_model_data(model_artifacts, # Where the model artifacts are stored
                                              inference_image,  # Which container to use for inference
                                              1, 'ml.m4.xlarge',# What sort of compute instance to use
                                              role = role)      # Our current role

Now we have created and deployed an endpoint which uses the modified inference code. We will need to know the name of this endpoint when we set up the Lambda function later on. Fortunately, the return value of the `endpoint_from_model_data()` method is the name of the endpoint.

In [None]:
model_endpoint

### Testing our inference code

Now that we have constructed the endpoint it is time to use it. To do so we will first create a predictor object and then send some data to it. In order to construct the predictor we need to know the name of the endpoint that we've just created. Fortunately, this is returned by the `endpoint_from_model_data()` method used earlier. We also need to tell SageMaker the format that we expect to use in order to send data. Since we want to send a string (the review itself) we set the content type to `text/plain`.

In [None]:
predictor = sage.predictor.RealTimePredictor(model_endpoint, content_type='text/plain')

And lastly, we send some reviews to the endpoint.

In [None]:
import glob

def test_reviews(data_dir='data/aclImdb', stop=250):
    
    results = []
    ground = []
    
    # We make sure to test both positive and negative reviews    
    for sentiment in ['pos', 'neg']:
        
        path = os.path.join(data_dir, 'test', sentiment, '*.txt')
        files = glob.glob(path)
        
        files_read = 0
        
        print('Starting ', sentiment, ' files')
        
        # Iterate through the files and send them to the predictor
        for f in files:
            with open(f) as review:
                # First, we store the ground truth (was the review positive or negative)
                if sentiment == 'pos':
                    ground.append(1)
                else:
                    ground.append(0)
                # Read in the review and convert to 'utf-8' for transmission via HTTP
                review_input = review.read().encode('utf-8')
                # Send the review to the predictor and store the results
                results.append(float(predictor.predict(review_input)))
                
            # Sending reviews to our endpoint one at a time takes a while so we
            # only send a small number of reviews
            files_read += 1
            if files_read == stop:
                break
            
    return ground, results

In [None]:
ground, results = test_reviews()

In [None]:
from sklearn.metrics import accuracy_score
accuracy_score(ground, results)

## Step 3: Exposing our endpoint to the outside world

Currently we have been access the model endpoint by constructing a predictor object which uses the endpoint and then just using the predictor object to perform inference. What if we wanted to create a web app which accessed our model? The way things are set up currently makes that not possible since in order to access a SageMaker endpoint the app would first have to authenticate with AWS using an IAM role which included access to SageMaker endpoints. However, there is an easier way! We just need to use some additional AWS services.

There are two services that we will be using to allow access to our model from the outside world. The first is called Lambda and the second is API Gatway.

Lambda is a service which allows someone to write some relatively simple code and have it executed whenever a chosen trigger occurs. For example, you may want to update a database whenever new data is uploaded to a folder stored on S3.

API Gateway is a service that allows you to create HTTP endpoints (url addresses) which are connected to other AWS services. One of the benefits to this is that you get to decide what credentials, if any, are required to access these endpoints.

In our case we are going to set up an HTTP endpoint through API Gateway which is open to the public. Then, whenever anyone sends data to our public endpoint we will have that trigger a Lambda function which will send the input (in our case a review) to the inference container and return the result.

> TODO: Include an image to help describe this.

### Setting up a Lambda function

The first thing we are going to do is set up a Lambda function. This Lambda function will be executed whenever our public API has data sent to it. When it is executed it will receive the data, perform any sort of processing that is required, send the data (the review) to the SageMaker endpoint we've created and then return the result.

#### Part A: Create an IAM Role for the Lambda function

Since we want the Lambda function to call a SageMaker endpoint, we need to make sure that it has permission to do so. To do this, we will construct a role that we can later give the Lambda function.

Using the AWS Console, navigate to the **IAM** page and click on **Roles**. Then, click on **Create role**. Make sure that the **AWS service** is the type of trusted entity selected and choose **Lambda** as the service that will use this role, then click **Next: Permissions**.

In the search box type `sagemaker` and select the check box next to the **AmazonSageMakerFullAccess** policy. Then, click on **Next: Review**.

Lastly, give this role a name. Make sure you use a name that you will remember later on, for example `LambdaSageMakerRole`. Then, click on **Create role**.

#### Part B: Create a Lambda function

Now it is time to actually create the Lambda function.

Using the AWS Console, navigate to the AWS Lambda page and click on **Create a function**. When you get to the next page, make sure that **Author from scratch** is selected. Now, name your Lambda function, using a name that you will remember later on, for example `sentiment_analysis_func`. Make sure that the **Python 3.6** runtime is selected and then choose the role that you created in the previous part. Then, click on **Create Function**.

On the next page you will see some information about the Lambda function you've just created. If you scroll down you should see an editor in which you can write the code that will be executed when your Lambda function is triggered. In our example, we will use the code below. Make sure you replace the `**ENDPOINT NAME HERE**` portion with the name of the endpoint that we deployed earlier.

```python
# We need to use the low-level library to interact with SageMaker since the SageMaker API
# is not available natively through Lambda.
import boto3

def lambda_handler(event, context):

    # The SageMaker runtime is what allows us to invoke the endpoint that we've created.
    runtime = boto3.Session().client('sagemaker-runtime')

    # Now we use the SageMaker runtime to invoke our endpoint, sending the review we were given
    response = runtime.invoke_endpoint(EndpointName = '**ENDPOINT NAME HERE**',    # The name of the endpoint we created
                                       ContentType = 'text/plain',                 # The data format that is expected
                                       Body = event['body'])                       # The actual review

    # The response is an HTTP response whose body contains the result of our inference
    result = response['Body'].read().decode('utf-8')

    return {
        'statusCode' : 200,
        'headers' : { 'Content-Type' : 'text/plain', 'Access-Control-Allow-Origin' : '*' },
        'body' : result
    }
```

Once you have copy and pasted the code above into the code editor, click on **Save** and your Lambda function will be up and running. Now we need to create a way for our web app to execute the Lambda function.

### Setting up API Gateway

Now that our Lambda function is set up, it is time to create a new API using API Gateway that will trigger the Lambda function we have just created.

Using AWS Console, navigate to **Amazon API Gateway** and then click on **Get started**.

On the next page, make sure that **New API** is selected and give the new api a name, for example, `sentiment_analysis_api`. Then, click on **Create API**.

Now we have created an API, however it doesn't currently do anything. What we want it to do is to trigger the Lambda function that we created earlier.

Select the **Actions** dropdown menu and click **Create Method**. A new blank method will be created, select its dropdown menu and select **POST**, then click on the check mark beside it.

For the integration point, make sure that **Lambda Function** is selected and click on the **Use Lambda Proxy integration**. This option makes sure that the data that is sent to the API is then sent directly to the Lambda function with no processing. It also means that the return value must be a proper response object as it will also not be processed by API Gateway.

Type the name of the Lambda function you created earlier into the **Lambda Function** text entry box and then click on **Save**. Click on **OK** in the pop-up box that then appears, giving permission to API Gateway to invoke the Lambda function you created.

The last step in creating the API Gateway is to select the **Actions** dropdown and click on **Deploy API**. You will need to create a new Deployment stage and name it anything you like, for example `prod`.

You have now successfully set up a public API to access your SageMaker model. Make sure to copy or write down the URL provided to invoke your newly created public API as this will be needed in the next step. This URL can be found at the top of the page, highlighted in blue next to the text **Invoke URL**.

## Step 4: Deploying our web app

Now that we have a publicly available API, we can start using it in a web app. For our purposes, we have provided a simple static html file which can make use of the public api you created earlier.

In the `website` folder there should be a file called `index.html`. Download the file to your computer and open that file up in a text editor of your choice. There should be a line which contains **\*\*REPLACE WITH PUBLIC API URL\*\***. Replace this string with the url that you wrote down in the last step and then save the file.

Now, if you open `index.html` on your local computer, your browser will behave as a local web server and you can use the provided site to interact with your SageMaker model.

If you'd like to go further, you can host this html file anywhere you'd like, for example using github or hosting a static site on Amazon's S3. Once you have done this you can share the link with anyone you'd like and have them play with it too!

> **Important Note** In order for the web app to communicate with the SageMaker endpoint, the endpoint has to actually be deployed and running. This means that you are paying for it. Make sure that the endpoint is running when you want to use the web app but that you shut it down when you don't need it, otherwise you will end up with a surprisingly large AWS bill.

### Delete the endpoint

Now that we are done testing our model we need to delete the endpoint so that it is no longer running.

In [None]:
sess.delete_endpoint(model_endpoint)