# Getting Started With TorchServe

This introductory lab will take you through the following hands-on exercises:
* [Install TorchServe](https://github.com/pytorch/serve#install-torchserve) and it's dependencies on an Amazon SageMaker Notebook
* Create a model store
* Download and use the Torch Model Archiver
* Start the TorchServe server
* Register/Unregister Models
* Perform Inference using TorchServe
* Host Multiple Models and Scale Workers
* Stop the TorchServe server

## Preparing the TorchServe environment on SageMaker
Let's start off by using [Amazon Corretto](https://aws.amazon.com/corretto/) to install Java 11 dependency:

In [None]:
%%bash
sudo rpm --import https://yum.corretto.aws/corretto.key 
sudo curl -L -o /etc/yum.repos.d/corretto.repo https://yum.corretto.aws/corretto.repo
sudo yum install -y java-11-amazon-corretto-devel

You can verify the Java 11 version and that ``JAVA_HOME`` is properly set by running the following:

In [None]:
!java -version
!echo $JAVA_HOME

Next, use ``pip`` to install TorchServe and the model archiver:

In [None]:
!pip install torch torchtext torchvision sentencepiece psutil future
!pip install torchserve torch-model-archiver

If the ``/serve`` subdirectory already exists, remove it. And, then clone the TorchServe repository into a new serve subdirectory. 

In [None]:
%%bash
if [ -d "serve" ]; then
    rm -r -f serve
fi
git clone https://github.com/pytorch/serve.git serve

## Storing a model

In order to store archived models, we will need to create a model store. Create the ``model_store`` directory which is ultimately referenced via a parameter by the ``model_archiver`` when you are packaging your model. If the ``model_store`` subdirectory already exists, remove all the files. Else, create a subdirectory in which to store your models. 

In [None]:
%%bash
if [ -d "model_store" ]; then
    rm -f model_store/*    
else
    mkdir model_store
fi

This lab uses a pre-trained model which helps us focus on the serving. Next, we will download a pre-trained model. The DenseNet model is one of the PyTorch TorchVision [models](https://pytorch.org/docs/stable/torchvision/models.html). You can read more about it at [arxiv.org](https://arxiv.org/abs/1608.06993) or on [Kaggle](https://www.kaggle.com/pytorch/densenet161).

In [None]:
%%bash
wget -q https://download.pytorch.org/models/densenet161-8d451a50.pth
ls *.pth

Now that we have the model, we will archive it by using the [TorchServe Model Archiver](https://github.com/pytorch/serve/blob/master/model-archiver/README.md). Note the many different arguments that can be passed including the model name, version, the model file, etc.

In [None]:
# In our example, we reference the serialized densenet model we just downloaded
!torch-model-archiver \
   --model-name densenet161 \
   --version 1.0 \
   --model-file serve/examples/image_classifier/densenet_161/model.py \
   --serialized-file densenet161-8d451a50.pth \
   --extra-files serve/examples/image_classifier/index_to_name.json \
   --handler image_classifier

!ls *.mar

We then move the archived model into the ``model_store`` subdirectory.

In [None]:
!mv *.mar model_store

## Starting TorchServe
With the model archived and sitting in our model store, we can now start the TorchServe server in the background. 

In [None]:
%%bash
torchserve --start --model-store model_store --models densenet161=densenet161.mar </dev/null &>/dev/null &

The [Inference API](https://github.com/pytorch/serve/blob/master/docs/inference_api.md) is listening on port 8080 by default. Now that the server has started, let's run a health check on the TorchServe process. The status from the following endpoint command should should read "Healthy".

In [None]:
!curl http://localhost:8080/ping

Being that we have placed the ``densenet161`` model archive in the model store, it was served as soon as the server started up. We can verify this by calling the [Management API](https://github.com/pytorch/serve/blob/master/docs/management_api.md).

In [None]:
!curl http://localhost:8081/models

## Performing Inference
To test the TorchServe model server, you just need to send a request to the Inference API. Let's start by pulling down an image of a [Proboscis Monkey](https://en.wikipedia.org/wiki/Proboscis_monkey) and a [Tiger Beetle](https://en.wikipedia.org/wiki/Tiger_beetle).
<img src="https://torchserve-workshop.s3.amazonaws.com/proboscis-monkey-tiger-beetle-grouped.png">


In [None]:
!curl -O https://torchserve-workshop.s3.amazonaws.com/proboscis-monkey.jpg
!curl -O https://torchserve-workshop.s3.amazonaws.com/tiger-beetle.jpg

Now that we have a couple images, we can use ``curl`` to send ``POST`` to the TorchServe predict endpoint with our images. The predictions endpoint returns a prediction response in JSON. With both the Proboscis Money and the Tiger Beetle, we see several different prediction types along with their associated confidence scores of each prediction.

In [None]:
!curl -X POST http://localhost:8080/predictions/densenet161 -T proboscis-monkey.jpg


In [None]:
!curl -X POST http://localhost:8080/predictions/densenet161 -T tiger-beetle.jpg

## Hosting Multiple Models and Scaling Workers
TorchServe provides a management API to list registered models, register new models to existing servers, unregistering current models, increasing or decreasing number of workers per model, describing the status of a model, adding versions, and setting default versions. The Management API is listening on port 8081 by default, but you can change the default behavior.

Let's start by downloading a new model. For this example, we will use a pre=trained Faster RCNN model. 

In [None]:
!wget https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
!torch-model-archiver --model-name fastrcnn --version 1.0 \
--model-file serve/examples/object_detector/fast-rcnn/model.py \
--serialized-file fasterrcnn_resnet50_fpn_coco-258fb6c6.pth \
--handler object_detector \
--extra-files serve/examples/object_detector/index_to_name.json

As we have previously done, let's move the model to the model store and then verify it is in the correct directory.

In [None]:
!mv fastrcnn.mar model_store
!ls -l ./model_store

Now we can register the new model.

In [None]:
!curl -X POST "http://localhost:8081/models?url=fastrcnn.mar"

And then query the list of registered models to verify our pre=trained Faster RCNN model is also being served.

In [None]:
!curl "http://localhost:8081/models"

Next let's scale workers for our model. By default, a new model has no workers assigned to it, so here we set a minimum number of workers.

Note: If your model is hosted on a CPU with many cores then you can easily scale the number of threads higher.

In [None]:
!curl -v -X PUT "http://localhost:8081/models/fastrcnn?min_worker=2"

We updated the workers and can now verify as seen below:

In [None]:
!curl "http://localhost:8081/models/fastrcnn"

Next, we can unregister the model if it no longer needs to be served for inference.

In [None]:
!curl -X DELETE http://localhost:8081/models/fastrcnn/
!curl -X DELETE http://localhost:8081/models/densenet161/

You can verify that the model was unregistered by querying the API once again.

In [None]:
!curl "http://localhost:8081/models"

Finally, when you have completed running inferences, you may stop the server by executing the torchserve command with the ``--stop`` flag.

In [None]:
!torchserve --stop

## Cleanup
The next step removes files created during this lab.

In [None]:
!chmod +x ./scripts/cleanup.sh
!./scripts/cleanup.sh