Name		Name	Last commit message	Last commit date
parent directory ..
models/triton/densenet_onnx		models/triton/densenet_onnx
src		src
1.densenet-local.ipynb		1.densenet-local.ipynb
2.bidaf-aks-v100.ipynb		2.bidaf-aks-v100.ipynb
README.md		README.md
deploymentconfig.json		deploymentconfig.json

README.md

page_type

languages

products

description

experimental

sample

python

azurecli

azure-machine-learning

Learn how to efficiently deploy to GPUs with the [Triton inference server](https://github.com/triton-inference-server/server) and Azure ML.

in preview

Real-time inference on GPUs in Azure Machine Learning

Note: this tutorial is experimental and prone to failure

The notebooks in this directory show how to take advantage of the interoperability between Azure Machine Learning and NVIDIA Triton Inference Server for cost-effective real time inference on GPUs.

Python instructions

Open either of the sample notebooks in this directory to run Triton in Python.

CLI instructions

You must have the latest version of the Azure Machine Learning CLI installed to run these commands. Follow the instructions here to download or upgrade the CLI.

python src/model_utils.py
az ml model register -p models/triton -n bidaf-model --model-framework=Multi
az ml model deploy -n triton-webservice -m bidaf-model:1 --dc deploymentconfig.json --compute-target aks-gpu-deploy

Once you have deployed, try querying the model metadata endpoint:

# Get the scoring URI
az ml service show --name triton-webservice
# Get the keys
az ml service get-keys --name triton-webservice
curl -H "Authorization: Bearer <primaryKey>" -v <scoring-uri>v2/ready

Read more about the KFServing predict API here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deploy-triton

deploy-triton

README.md

Real-time inference on GPUs in Azure Machine Learning

Python instructions

CLI instructions

Files

deploy-triton

Directory actions

More options

Directory actions

More options

Latest commit

History

deploy-triton

Folders and files

parent directory

README.md

Real-time inference on GPUs in Azure Machine Learning

Python instructions

CLI instructions