Skip to content

Latest commit

 

History

History

multi_instance_resnet50_pytorch

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Multi-Instance Deployment of ResNet50 PyTorch Model

Overview

The example presents a deployment of Multi-Instance ResNet50 PyTorch model. The model is deployed multiple times what improve the throughput of the model when GPU is underutilized. The model by default is deployed on same GPU twice.

Example consists of following scripts:

  • install.sh - install additional dependencies for downloading model from HuggingFace
  • server.py - start the model with Triton Inference Server
  • client.sh - execute Perf Analyzer to measure the performance

Requirements

The example requires the torch package. It can be installed in your current environment using pip:

pip install torch

Or you can use NVIDIA PyTorch container:

docker run -it --gpus 1 --shm-size 8gb -v {repository_path}:{repository_path} -w {repository_path} nvcr.io/nvidia/pytorch:23.07-py3 bash

If you select to use container we recommend to install NVIDIA Container Toolkit.

Quick Start

The step-by-step guide:

  1. Install PyTriton following the installation instruction
  2. Install the additional packages using install.sh
./install.sh
  1. In current terminal start the model on Triton using server.py
./server.py
  1. Open new terminal tab (ex. Ctrl + T on Ubuntu) or window
  2. Go to the example directory
  3. Run the client.sh to run performance measurement on model:
./client.sh