# Deploying yolort on ONNX Runtime


The ONNX model exported by yolort differs from other pipeline in the following three ways.

- We embed the pre-processing into the graph (mainly composed of `letterbox`). and the exported model expects a `Tensor[C, H, W]`, which is in `RGB` channel and is rescaled to range `float32 [0-1]`.
- We embed the post-processing into the model graph with `torchvision.ops.batched_nms`. So the outputs of the exported model are straightforward `boxes`, `labels` and `scores` fields of this image.
- We adopt the dynamic shape mechanism to export the ONNX models.

## Set up environment and function utilities

First you should install ONNX Runtime first to run this tutorial. See the ONNX Runtime [installation matrix](https://onnxruntime.ai) for recommended instructions for desired combinations of target operating system, hardware, accelerator, and language.

A quick solution is to install via pip on X64:

```bash
pip install onnxruntime
```

In [1]:
import os
import torch

os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"

device = torch.device('cpu')

In [2]:
import cv2
import onnx
import onnxruntime

from yolort.models import YOLOv5
from yolort.v5 import attempt_download

from yolort.utils import get_image_from_url, read_image_to_tensor
from yolort.utils.image_utils import to_numpy

Define some parameters used for defining the model, exporting ONNX models and inferencing on ONNX Runtime.

In [3]:
img_size = 640
size = (img_size, img_size)  # Used for pre-processing
size_divisible = 64
score_thresh = 0.35
nms_thresh = 0.45
opset_version = 11

Get images for inferenceing.

In [4]:
img_src1 = "https://huggingface.co/spaces/zhiqwang/assets/resolve/main/bus.jpg"
img_one = get_image_from_url(img_src1)
img_one = read_image_to_tensor(img_one, is_half=False)
img_one = img_one.to(device)

img_src2 = "https://huggingface.co/spaces/zhiqwang/assets/resolve/main/zidane.jpg"
img_two = get_image_from_url(img_src2)
img_two = read_image_to_tensor(img_two, is_half=False)
img_two = img_two.to(device)

## Load the model trained from yolov5

The model used below is officially released by yolov5 and trained on COCO 2017 datasets.

In [5]:
# yolov5n6.pt is downloaded from 'https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5n6.pt'
model_path = "yolov5n6.pt"
onnx_path = "yolov5n6.onnx"
checkpoint_path = attempt_download(model_path)

In [6]:
model = YOLOv5.load_from_yolov5(
    model_path,
    size=size,
    size_divisible=size_divisible,
    score_thresh=score_thresh,
    nms_thresh=nms_thresh,
)

model = model.eval()
model = model.to(device)


                 from  n    params  module                                  arguments                     
  0                -1  1      1760  yolort.v5.models.common.Conv            [3, 16, 6, 2, 2]              
  1                -1  1      4672  yolort.v5.models.common.Conv            [16, 32, 3, 2]                
  2                -1  1      4800  yolort.v5.models.common.C3              [32, 32, 1]                   
  3                -1  1     18560  yolort.v5.models.common.Conv            [32, 64, 3, 2]                
  4                -1  2     29184  yolort.v5.models.common.C3              [64, 64, 2]                   
  5                -1  1     73984  yolort.v5.models.common.Conv            [64, 128, 3, 2]               
  6                -1  3    156928  yolort.v5.models.common.C3              [128, 128, 3]                 
  7                -1  1    221568  yolort.v5.models.common.Conv            [128, 192, 3, 2]              
  8                -1  1    167040  

### Inference on PyTorch backend

In [7]:
images = [img_one]

In [8]:
with torch.no_grad():
    model_out = model(images)

In [9]:
%%timeit
with torch.no_grad():
    model_out = model(images)

The slowest run took 5.09 times longer than the fastest. This could mean that an intermediate result is being cached.
115 ms ± 71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [10]:
model_out[0]['boxes']

tensor([[ 32.27846, 225.15266, 811.47729, 740.91071],
        [ 50.42178, 387.48898, 241.54399, 897.61041],
        [219.03331, 386.14346, 345.77689, 869.02582],
        [678.05023, 374.65326, 809.80334, 874.80621]])

In [11]:
model_out[0]['scores']

tensor([0.88238, 0.84486, 0.72629, 0.70077])

In [12]:
model_out[0]['labels']

tensor([5, 0, 0, 0])

## Export the model to ONNX

In [13]:
from yolort.runtime.ort_helper import export_onnx

In [14]:
print(f'We are using opset version: {opset_version}')

We are using opset version: 11


In [15]:
export_onnx(model=model, onnx_path=onnx_path, opset_version=opset_version)

  (torch.floor((input.size(i + 2).float() * torch.tensor(scale_factors[i], dtype=torch.float32)).float()))
  img_h, img_w = _get_shape_onnx(img)
  anchors = torch.as_tensor(self.anchor_grids, dtype=torch.float32, device=device).to(dtype=dtype)
  strides = torch.as_tensor(self.strides, dtype=torch.float32, device=device).to(dtype=dtype)
  strides = torch.as_tensor(self.strides, dtype=torch.float32, device=device).to(dtype=dtype)
  for head_output, grid, shift, stride in zip(head_outputs, grids, shifts, strides):


Check the exported ONNX model is well formed

In [16]:
# Load the ONNX model
onnx_model = onnx.load(onnx_path)

# Check that the model is well formed
onnx.checker.check_model(onnx_model)

# Print a human readable representation of the graph
# print(onnx.helper.printable_graph(model.graph))

## Inference on ONNX Runtime backend

Check the version of ONNX Runtime first.

In [17]:
print(f'Starting with onnx {onnx.__version__}, onnxruntime {onnxruntime.__version__}...')

Starting with onnx 1.10.2, onnxruntime 1.10.0...


Prepare the inputs for ONNX Runtime.

In [18]:
inputs, _ = torch.jit._flatten(images)
outputs, _ = torch.jit._flatten(model_out)

In [19]:
inputs = list(map(to_numpy, inputs))
outputs = list(map(to_numpy, outputs))

We provide a pipeline for deploying yolort with ONNX Runtime.

In [20]:
from yolort.runtime import PredictorORT

In [21]:
y_runtime = PredictorORT(onnx_path, device="cpu")

Providers was initialized.
Set inference device to CPU


In [22]:
ort_outs1 = y_runtime.predict(inputs)

Let's measure the inferencing speed of ONNX Runtime.

In [23]:
%%timeit
y_runtime.predict(inputs)

47.7 ms ± 614 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


### Verify whether the inference results are consistent with PyTorch's

In [24]:
for i in range(0, len(outputs)):
    torch.testing.assert_allclose(outputs[i], ort_outs1[i], rtol=1e-04, atol=1e-07)

print("Exported model has been tested with ONNXRuntime, and the result looks good!")

Exported model has been tested with ONNXRuntime, and the result looks good!


### Verify another image

When using dynamic shape inference in trace mode, the shape inference mechanism for some operators may not work, so we verify it once for another image with a different shape as well.

In [25]:
images = [img_two]

In [26]:
with torch.no_grad():
    out_pytorch = model(images)

In [27]:
inputs, _ = torch.jit._flatten(images)
outputs, _ = torch.jit._flatten(out_pytorch)

In [28]:
inputs = list(map(to_numpy, inputs))
outputs = list(map(to_numpy, outputs))

Compute onnxruntime output prediction.

In [29]:
ort_outs2 = y_runtime.predict(inputs)

Let's measure the inferencing speed of ONNX Runtime.

In [30]:
%%timeit
y_runtime.predict(inputs)

37.5 ms ± 767 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Verify whether the inference results are consistent with PyTorch's.

In [31]:
for i in range(0, len(outputs)):
    torch.testing.assert_allclose(outputs[i], ort_outs2[i], rtol=1e-04, atol=1e-07)

print("Exported model has been tested with ONNXRuntime, and the result looks good!")

Exported model has been tested with ONNXRuntime, and the result looks good!
