In this example, we will walk you through how to use NVIDIA Triton Inference Server on Amazon SageMaker to deploy a HuggingFace T5 NLP model for Text Translation. In particular, this model will be using:
T5-small HuggingFace PyTorch Translation Model (Served using Triton's Python Backend)
-
Launch SageMaker notebook instance with
g5.xlarge
instance. This example can also be run on a SageMaker studio notebook instance but the steps that follow will focus on the notebook instance.- IMPORTANT: In Notebook instance settings, within Additional Configuration, for Volume Size in GB specify at least 100 GB.
- For git repositories select the option
Clone a public git repository to this notebook instance only
and specify the Git repository URL
-
Once JupyterLab is ready, launch the t5_pytorch_python-backend.ipynb notebook with conda_python3 conda kernel and run through this notebook to learn how to host multiple NLP models on
g5.2xlarge
GPU behind MME endpoint.