<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>

# Assessment: Sentiment Analysis with TAO and Riva
Sentiment analysis is a type of text classification, a common NLP task. 
Using a pretrained language model, such as BERT, it is possible to train a text classification model to classify sentences among defined categories.  In the case of sentiment analysis, there are only two categories: positive and negative.

<img src="images/assess/sentiment_analysis.png">

### Table of Contents
[The Problem](#The-Problem)<br>
[Scoring](#Scoring)<br>
[Step 1: Prepare the Project](#Step-1:-Prepare-the-Project)<br>
[Step 2: Train](#Step-2:-Train)<br>
[Step 3: Infer and Evaluate](#Step-3:-Infer-and-Evaluate)<br>
[Step 4: Export Custom Model](#Step-4:-Export-Custom-Model)<br>
[Step 5: Build and Deploy with Riva](#Step-5:-Build-and-Deploy-with-Riva)<br>
[Step 6: Start Riva Services](#Step-6:-Start-Riva-Services)<br>
[Step 7: Submit You Assessment](#Step-7:-Submit-You-Assessment)<br>

### Notebook Dependencies
The steps in this notebook assume that you have:

1. **NGC Credentials**<br>Be sure you have added your NGC credential as described in the [NGC Setup notebook](003_Intro_NGC_Setup.ipynb).  If you have restarted the course instance, you will need to repeat this step.

In [None]:
# Start fresh...
# Clear Docker containers
!docker kill $(docker ps -q)
# Check for clean environment - this should be empty
!docker ps

---
# The Problem

### SST-2 Movie Reviews
The [Stanford Sentiment Treebank v2 (SST-2)](https://nlp.stanford.edu/sentiment/index.html) dataset is a corpus with fully labeled (two classes: positive and negative) single sentences extracted from movie reviews. Your task is to train a model using the dataset and deploy it to Riva, where you can run inference using the Riva API.

### Your Project
You are provided with labeled training and validation datasets, `train_small.tsv` and `dev_small.tsv` for the project.  There is also a test set, `test.tsv`, for a final test of the model.  All datasets are contained in the `tao/data/SST-2` directory.  You can open any of these files to take a look at the actual data and format:
* [train_small.tsv](tao/data/SST-2/train_small.tsv)
* [dev_small.tsv](tao/data/SST-2/dev_small.tsv)
* [test.tsv](tao/data/SST-2/test.tsv)

Your assignment is to train a [text classification model](https://docs.nvidia.com/tao/tao-toolkit/text/nlp/text_classification.html) with TAO using the `tao text_classification` launch command. After training, you must export the custom model, then deploy it using Riva.  

---
# Scoring
You will be assessed on your ability to effectively and efficiently train and deploy the model.  This coding assessment is worth 70 points, divided as follows:

### Rubric

| Step                    | Graded                                                 | FIXMEs?  | Points |
|-------------------------|--------------------------------------------------------|----------|--------|
| 1. Prepare the Project  | Specs and path definitions (spec files are present)    |  1       | 5      |
| 2. Train                | Efficient training parameters (faster training)        |  5       | 15     |
| 3. Infer and Evaluate   | Achieve good inference performance (F1 value >= 88)    |  0       | 10     |
| 4. Export Custom Model  | Export for Riva (model exported in correct format)   |  1       | 12     |
| 5. Build and Deploy     | Riva ServiceMaker (correct models built and loaded)  |  2       | 14     |
| 6. Start Riva         | Riva Server (correct config; models run)             |  1       | 14     |

Although you are very capable at this point of building the project without any help at all, some scaffolding is provided, including specific names for variables.  This is for the benefit of the autograder, so please use these constructs for your assessment.  In addition, a copy of the latest output for your executed cells in some cases is saved in the `my_assessment` directory.  Along the way, there are a few opportunities to check your work to see if you are on the right track. 

Once you are confident that you've built a reliable model, follow the instructions for submission at the end of the notebook.

### Resources and Hints

* **[TAO User's Guide](https://docs.nvidia.com/tao/tao-toolkit/index.html)**<br>
* **[Riva Speech Skills User's Guide](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html)**<br>
* **TAO Example**<br>
Review what you've learned in the [NER Fine-Tuning](007_NLP_Finetune_NER.ipynb) notebook to train, infer, evaluate, and export a TAO model.  The `tao token_classification` commands are very similar to the `tao text_classification` commands.
* **Riva Deployment Example**<br>
Review what you've learned in the [NER Model Deployment with Riva](008_NLP_Deploy_NER.ipynb) notebook to build and deploy the model, as well as start the Riva server.
* **AMP Optimization Level (trainer.amp_level):**<br>
To use mixed precision, set AMP to 'O1' or 'O2'; to train without mixed precision, set it to 'O0'.
* **Precision (trainer.precision):**<br>
To speed up training, you can set the precision to 16 instead of the standard 32 with little or no loss in accuracy.
* **Number of epochs (trainer.max_epochs):**<br>
The project is designed so that you should achieve success on this dataset with only 2 epochs, but feel free to run more. On a Tesla T4, this takes 5-6 minutes if you run it efficiently!

---
# Step 1: Prepare the Project

### Set up Project Paths (not graded)
This block is complete, but feel free to add to it.

In [None]:
# Set the TAO paths for the project
##### TAO paths - source
SOURCE_MOUNT="/dli/task/tao"
DESTINATION_MOUNT = "/workspace/mount"

##### TAO paths - source
# Define location of the SST-2 dataset
DATA = SOURCE_MOUNT+'/data/SST-2'
# Directory where the .riva model is stored
EXPORT_MODEL_LOC = SOURCE_MOUNT + '/results/sst2/export'

##### TAO paths - destination (from the perspective of the TAO Docker)
# The path to the specification YAML 
SPECS_DIR = DESTINATION_MOUNT + '/specs'
# The results are saved at this path by default
RESULTS_DIR = DESTINATION_MOUNT + '/results'
# The data are saved at this path by default
DATA_DIR = DESTINATION_MOUNT + '/data'
# The results are saved at this path by default
MODELS_DIR = DESTINATION_MOUNT + '/models'

# Set your encryption key, and use the same key for all commands. Please use "tlt_encode" if you'd like to deploy the models later with NVIDIA Riva.
KEY = 'tlt_encode'

### Get the Spec Files (graded)
Complete the <i><strong style="color:green;">#FIXME</strong></i> line(s) and run the cell.

In [None]:
import os
from shutil import rmtree

# Delete the specs directory if it already exists
folder = SOURCE_MOUNT + '/specs'
if os.path.exists(folder):
    rmtree(folder)

# Get the text classification task spec files
!tao #FIXME \
    -o $SPECS_DIR/text_classification \
    -r $RESULTS_DIR \
    2>&1|tee my_assessment/step1.txt # DO NOT REMOVE THIS LINE

---
# Step 2: Train
### Run the Trainer (graded)
Review the `train.yaml` file you've just downloaded. Run the trainer in TAO and override YAML config values as necessary.

Complete the <i><strong style="color:green;">#FIXME</strong></i> line(s) and run the cell. Feel free to add/remove override values as you see fit.

In [None]:
%%time
# For BERT training on SST-2:
!tao #FIXME \
    -e $SPECS_DIR/text_classification/train.yaml \
    -g 1  \
    -k $KEY \
    -r $RESULTS_DIR/sst2 \
    training_ds.file_path=#FIXME \
    validation_ds.file_path=#FIXME \
    model.class_labels.class_labels_file=$DATA_DIR/SST-2/label_ids.csv \
    trainer.amp_level=#FIXME \
    trainer.precision=#FIXME \
    trainer.max_epochs=2 \
    2>&1|tee my_assessment/step2.txt # DO NOT REMOVE THIS LINE

The train command produces a model file called `trained-model.tlt` saved at `results/sst2/checkpoints/trained-model.tlt`. 

---
# Step 3: Infer and Evaluate

### Create the Queries (not graded)
Execute the following cell to create queries for inference.

In [None]:
%%writefile $SOURCE_MOUNT/specs/text_classification/infer.yaml

# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
# TAO Spec file for inference using a previously pretrained BERT model for a text classification task.

# "Simulate" user input: batch with four samples.
input_batch:
- "this is a good script , good dialogue , funny even for adults ."
- "the affectionate loopiness that once seemed congenital to demme s perspective has a tough time emerging from between the badly dated cutesy-pie mystery scenario a nd the newfangled hollywood post-production effects ."
- " this piece of channel 5 grade trash is , quite frankly , an insult to the intelligence of the true genre enthusiast . "
- "a delightful coming-of-age story ."

### Run Inference on the Trained Model (not graded)

In [None]:
# Run inference on user data:
!tao text_classification infer \
    -e $SPECS_DIR/text_classification/infer.yaml \
    -g 1 \
    -m $RESULTS_DIR/sst2/checkpoints/trained-model.tlt \
    -k $KEY \
    -r $RESULTS_DIR/sst2/infer

### Evaluate your Model (results graded)
Execute the following cell without changes.  Review your output to see if you had an F1 result above the 88% goal.  If not, you may need to retrain your model.

In [None]:
# For BERT evaluation on SST-2:
!tao text_classification evaluate \
    -e $SPECS_DIR/text_classification/evaluate.yaml \
    -g 1 \
    -m $RESULTS_DIR/sst2/checkpoints/trained-model.tlt \
    -k $KEY \
    -r $RESULTS_DIR/sst2/eval \
    test_ds.file_path=$DATA_DIR/SST-2/test.tsv \
    test_ds.batch_size=32 \
    test_ds.num_samples=-1 \
    2>&1|tee my_assessment/step3.txt # DO NOT REMOVE THIS LINE

---
# Step 4: Export Custom Model
### Export the Model for Riva (graded)
Complete the <i><strong style="color:green;">#FIXME</strong></i> line(s) and run the cell.

In [None]:
#  For export to Riva:
!tao text_classification export \
    -e $SPECS_DIR/text_classification/export.yaml \
    -g 1 \
    -m $RESULTS_DIR/sst2/checkpoints/trained-model.tlt \
    -k $KEY \
    -r $RESULTS_DIR/sst2/export/ \
    export_format=#FIXME \
    export_to=tc-model.riva \
    2>&1|tee my_assessment/step4.txt # DO NOT REMOVE THIS LINE

In [None]:
# Check your work - does the exported tc-model.riva model exist?
!ls $EXPORT_MODEL_LOC

---
# Step 5: Build and Deploy with Riva
### Set up Project Paths (not graded)
This block is complete, but feel free to add to it.

In [None]:
# Set the Riva paths for the project
WORKSPACE = "/dli/task"

##### Riva Paths
# ServiceMaker Docker
RIVA_SM_CONTAINER = "nvcr.io/nvidia/riva/riva-speech:1.4.0-beta-servicemaker"

# Model output directories
RMIR_LOC = WORKSPACE + "/riva/riva_quickstart/models_repo_assessment/rmir"
RIVA_MODEL_LOC = WORKSPACE + '/riva/riva_quickstart/models_repo_assessment'

# Model Names
EXPORT_MODEL_NAME = "tc-model.riva"  
RMIR_MODEL_NAME = "tc-model.rmir"

# Riva Quick Start 
RIVA_QS = WORKSPACE + "/riva/riva_quickstart"

### Build and Deploy with Riva ServiceMaker (graded)
Complete the <i><strong style="color:green;">#FIXME</strong></i> line(s) and run the cells.

In [None]:
# Syntax: riva-build <task-name> output-dir-for-rmir/model.rmir:key dir-for-riva/model.riva:key
!docker run --rm --gpus 1 \
    -v $EXPORT_MODEL_LOC:/tao \
    -v $RMIR_LOC:/riva \
    $RIVA_SM_CONTAINER -- \
    riva-build #FIXME \
    -f /riva/$RMIR_MODEL_NAME:$KEY /tao/$EXPORT_MODEL_NAME:$KEY \
    2>&1|tee my_assessment/step5.txt # DO NOT REMOVE THIS LINE

In [None]:
# Check your work - does the exported tc-model.rmir model exist?
!ls $RMIR_LOC

In [None]:
# Syntax: riva-deploy -f dir-for-rmir/model.rmir:key output-dir-for-repository
!docker run --rm --gpus 1 \
    -v $RIVA_MODEL_LOC:/data \
    $RIVA_SM_CONTAINER -- \
    riva-deploy -f #FIXME \
    /data/models/ \
    2>&1|tee -a my_assessment/step5.txt # DO NOT REMOVE THIS LINE

In [None]:
# Check your work - are there optimized models for text classification?
!ls $RIVA_MODEL_LOC/models

---
# Step 6: Start Riva Services

### Configure and Start Riva (graded)
Next, modify the [config.sh](riva/riva_quickstart/config.sh) to enable relevant Riva services. 
In this case, we want to start NLP services, provide the encryption key, and update the path to the model repository (`RIVA_MODEL_LOC`). 
Open the [config.sh](riva/riva_quickstart/config.sh) and make changes where necessary, then start the server.

In [None]:
# Run Riva Start. This will deploy the model.
!cd $RIVA_QS && bash riva_start.sh config.sh

In [None]:
# Check Riva running services 
!docker logs riva-speech \
    2>&1|tee my_assessment/step6.txt # DO NOT REMOVE THIS LINE

### Riva Service Request (not graded)
Although the SST-2 data set is trained on movie sentiments, it will likely work in our restaurant domain too.  Give it a try with the following queries or make up your own!

In [None]:
%run my_assessment/sentiment_analysis_client.py --query "I like pizza"
%run my_assessment/sentiment_analysis_client.py --query "I don't like this restaurant"
%run my_assessment/sentiment_analysis_client.py --query "yeah, sounds good"

### Stop Riva Services 

In [None]:
# Shut down Riva 
!bash $RIVA_QS/riva_stop.sh

---
# Step 7: Submit You Assessment
How were your results? 

If you are satisfied that you have completed the code correctly, and that your training and deployment are correct, you can submit your project as follows to the autograder:

1. Go back to the GPU launch page and click the checkmark to run the assessment:

<img src="images/assess/assessment_checkmark.png">

2. That's it!  You'll receive your grade feedback in the pop-up window. 

<img src="images/assess/assessment_pass_popup.png">

You can check your assessment progress in the course progress tab.  Note that partial values for the coding assessment **won't be visible here - it shows up as either 0 (if you achieve <65) or the full 70 points**.  Be sure to complete the additional questions to qualify for your final certificate!

<img src="images/assess/progress.png">

<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>