![DLI Header](../images/DLI_Header.png)

# Triton Inference Server

[NVIDIA Triton™ Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server) delivers fast and scalable AI in production by deploying AI/DL models in a variety of formats. Morpheus provides access to Triton as a back end for performing inference in Morpheus pipelines.

## Objectives

By the end of this notebook you will:

- Have a high-level understanding of Triton.
- Confirm that Triton is ready for use in this interactive environment.
- Know where to find out more about using Triton.

---

## Triton Presentation

Before proceeding, check out this [2-minute introductory video on Triton](https://youtu.be/1kOaYiNVgFs).

---

## Confirm Triton is Ready

As part of this interactive environment, a Triton inference server has already been prepared for use. Triton is running at a host named `triton`. Run the cell below to send an HTTP request to the Triton server to confirm it is up and running. A status code of `200 OK` indicates that it is:

In [1]:
!curl -i triton:8000/v2/health/ready

HTTP/1.1 200 OK
[1mContent-Length[0m: 0
[1mContent-Type[0m: text/plain



## Confirm Model Repositories are Loaded

When using Triton, we load trained models as "repositories" into the server. We can send an HTTP POST request to the Triton server to inspect the models that are already loaded for us to begin using:

In [2]:
!curl -s -X POST triton:8000/v2/repository/index | jq

[1;39m[
  [1;39m{
    [0m[34;1m"name"[0m[1;39m: [0m[0;32m"abp-nvsmi-xgb"[0m[1;39m,
    [0m[34;1m"version"[0m[1;39m: [0m[0;32m"1"[0m[1;39m,
    [0m[34;1m"state"[0m[1;39m: [0m[0;32m"READY"[0m[1;39m
  [1;39m}[0m[1;39m,
  [1;39m{
    [0m[34;1m"name"[0m[1;39m: [0m[0;32m"phishing-bert-onnx"[0m[1;39m,
    [0m[34;1m"version"[0m[1;39m: [0m[0;32m"1"[0m[1;39m,
    [0m[34;1m"state"[0m[1;39m: [0m[0;32m"READY"[0m[1;39m
  [1;39m}[0m[1;39m,
  [1;39m{
    [0m[34;1m"name"[0m[1;39m: [0m[0;32m"sid-minibert-onnx"[0m[1;39m,
    [0m[34;1m"version"[0m[1;39m: [0m[0;32m"1"[0m[1;39m,
    [0m[34;1m"state"[0m[1;39m: [0m[0;32m"READY"[0m[1;39m
  [1;39m}[0m[1;39m
[1;39m][0m


In the next section we will utilize the `abp-nvsmi-xgb` model. This model ships with the Morpheus toolkit. It is an XGBoost model trained to perform anomalous behavior profiling on GPU-activity logs. Given that the FIL pipeline is designed to perform inference on tree-based models, like XGBoost, this makes sense.

---

## Additional Triton Resources

Triton ships as a part of the Morpheus toolkit, however, a deep dive into Triton is beyond the scope of this workshop. For the many of you who may wish to learn more about Triton, please bookmark the following resources for further study.

- [Deploying a Model for Inference at Production Scale](https://courses.nvidia.com/courses/course-v1:DLI+S-FX-03+V1/about): This interactive self-paced DLI course is the best way to start learning Triton.
- [NVIDIA Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server): The main homepage for Triton.
- [NVIDIA Developer Blog](https://developer.nvidia.com/blog/tag/triton/): The NVIDIA Developer Blog has a wealth of resources to help you get started with Triton.

---

## Next

Now that you have a high-level understanding that Triton can host models and perform very fast inference, and that it is enabled in this environment for us to use, it's time to utilize it in a Morpheus pipeline.

Please continue to the next notebook.