# Deploying yolort on TVM

This article is an introductory tutorial to deploy PyTorch YOLOv5 models with Relay VM.

For us to begin with, PyTorch should be installed.
TorchVision is also required since we will be using it as our model zoo.

A quick solution is to install via pip


```shell
pip install torch==1.10.1
pip install torchvision==0.11.2
```

or please refer to official site
https://pytorch.org/get-started/locally/

PyTorch versions should be backwards compatible but should be used
with the proper TorchVision version.

Currently, Only test `TVM` with PyTorch 1.7.x and 1.10.x, other versions may be unstable.

And this notebook is running on macOS M1.

---

Copyright Â© Most of the codes is copied from the [TVM tutorial](https://tvm.apache.org/docs/tutorials/frontend/deploy_object_detection_pytorch.html#sphx-glr-tutorials-frontend-deploy-object-detection-pytorch-py).

In [1]:
import tvm
from tvm import relay
from tvm.runtime.vm import VirtualMachine

import numpy as np
import cv2

# PyTorch imports
import torch
from torch import nn
import torchvision

## Load pre-trained `yolov5n` from yolort and do tracing

In [2]:
in_size = 640
input_shape = (in_size, in_size)

In [3]:
from yolort.models import yolov5n
from yolort.relay import get_trace_module

In [4]:
model_func = yolov5n(pretrained=True, size=(in_size, in_size))
script_module = get_trace_module(model_func, input_shape=input_shape)

  if hasattr(mod, name):
  if hasattr(mod, name):
  item = getattr(mod, name)
  for img in inputs:
  images = [img for img in images]
  (torch.floor((input.size(i + 2).float() * torch.tensor(scale_factors[i], dtype=torch.float32)).float()))
  img_h, img_w = _get_shape_onnx(img)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  anchors = torch.as_tensor(self.anchor_grids, dtype=torch.float32, device=device).to(dtype=dtype)
  strides = torch.as_tensor(self.strides, dtype=torch.float32, device=device).to(dtype=dtype)
  strides = torch.as_tensor(self.strides, dtype=torch.float32, device=device).to(dtype=dtype)
  for head_output, grid, shift, stride in zip(head_outputs, grids, shifts, strides):


In [5]:
script_module.graph

graph(%self.1 : __torch__.yolort.relay.trace_wrapper.TraceWrapper,
      %x : Float(1, 3, 640, 640, strides=[1228800, 409600, 640, 1], requires_grad=0, device=cpu)):
  %model : __torch__.yolort.models.yolo_module.YOLOv5 = prim::GetAttr[name="model"](%self.1)
  %4973 : (Tensor, Tensor, Tensor) = prim::CallMethod[name="forward"](%model, %x)
  %4970 : Float(0, 4, strides=[4, 1], requires_grad=0, device=cpu), %4971 : Float(0, strides=[1], requires_grad=0, device=cpu), %4972 : Long(0, strides=[1], requires_grad=0, device=cpu) = prim::TupleUnpack(%4973)
  %3751 : (Float(0, 4, strides=[4, 1], requires_grad=0, device=cpu), Float(0, strides=[1], requires_grad=0, device=cpu), Long(0, strides=[1], requires_grad=0, device=cpu)) = prim::TupleConstruct(%4970, %4971, %4972)
  return (%3751)

## Download a test image and pre-process

In [6]:
from yolort.utils import get_image_from_url

img_source = "https://huggingface.co/spaces/zhiqwang/assets/resolve/main/bus.jpg"
# img_source = "https://huggingface.co/spaces/zhiqwang/assets/resolve/main/zidane.jpg"
img = get_image_from_url(img_source)

img = img.astype("float32")
img = cv2.resize(img, (in_size, in_size))

img = np.transpose(img / 255.0, [2, 0, 1])
img = np.expand_dims(img, axis=0)

## Import the graph to Relay

In [7]:
input_name = "input0"
shape_list = [(input_name, (1, 3, *input_shape))]
mod, params = relay.frontend.from_pytorch(script_module, shape_list)

Using injective.cpu for cast based on highest priority (10)
Using injective.cpu for strided_slice based on highest priority (10)
Using reduce.cpu for min based on highest priority (10)
Using injective.cpu for cast based on highest priority (10)
Using injective.cpu for divide based on highest priority (10)
Using injective.cpu for multiply based on highest priority (10)
Using reduce.cpu for max based on highest priority (10)
Using injective.cpu for minimum based on highest priority (10)
Using injective.cpu for floor based on highest priority (10)
Using injective.cpu for cast based on highest priority (10)
Using injective.cpu for strided_slice based on highest priority (10)
Using reduce.cpu for min based on highest priority (10)
Using injective.cpu for cast based on highest priority (10)
Using injective.cpu for divide based on highest priority (10)
Using injective.cpu for multiply based on highest priority (10)
Using reduce.cpu for max based on highest priority (10)
Using injective.cpu fo

## Compile with Relay VM

Note: Currently only CPU target is supported. For x86 target, it is
highly recommended to build TVM with Intel MKL and Intel OpenMP to get
best performance, due to the existence of large dense operator in
torchvision rcnn models.

In [8]:
# Add "-libs=mkl" to get best performance on x86 target.
# For x86 machine supports AVX512, the complete target is
# "llvm -mcpu=skylake-avx512 -libs=mkl"
target = "llvm"

with tvm.transform.PassContext(opt_level=3):
    vm_exec = relay.vm.compile(mod, target=target, params=params)

Using injective.cpu for add based on highest priority (10)
Using injective.cpu for sqrt based on highest priority (10)
Using injective.cpu for divide based on highest priority (10)
Using injective.cpu for multiply based on highest priority (10)
Using injective.cpu for expand_dims based on highest priority (10)
Using injective.cpu for negative based on highest priority (10)
Using injective.cpu for multiply based on highest priority (10)
Using injective.cpu for add based on highest priority (10)
Using injective.cpu for expand_dims based on highest priority (10)
Using injective.cpu for add based on highest priority (10)
Using injective.cpu for sqrt based on highest priority (10)
Using injective.cpu for divide based on highest priority (10)
Using injective.cpu for multiply based on highest priority (10)
Using injective.cpu for expand_dims based on highest priority (10)
Using injective.cpu for negative based on highest priority (10)
Using injective.cpu for multiply based on highest priority

## Inference with Relay VM

In [9]:
ctx = tvm.cpu()
vm = VirtualMachine(vm_exec, ctx)
vm.set_input("main", **{input_name: img})
tvm_res = vm.run()

In [10]:
%%timeit
vm.set_input("main", **{input_name: img})
tvm_res = vm.run()

65.2 ms Â± 1.31 ms per loop (mean Â± std. dev. of 7 runs, 10 loops each)


## Get boxes with score larger than 0.6

In [11]:
score_threshold = 0.6
boxes = tvm_res[0].asnumpy().tolist()
valid_boxes = []
for i, score in enumerate(tvm_res[1].asnumpy().tolist()):
    if score > score_threshold:
        valid_boxes.append(boxes[i])
    else:
        break

print(f"Get {len(valid_boxes)} valid boxes")

Get 4 valid boxes


## Verify the Inference Output on TVM backend

In [12]:
with torch.no_grad():
    torch_res = script_module(torch.from_numpy(img))

In [13]:
for i in range(len(torch_res)):
    torch.testing.assert_allclose(torch_res[i], tvm_res[i].asnumpy(), rtol=1e-4, atol=1e-4)

print("Exported model has been tested with TVM Runtime, and the result looks good!")

Exported model has been tested with TVM Runtime, and the result looks good!
