# Deploying yolort on TVM

This article is an introductory tutorial to deploy PyTorch YOLOv5 models with Relay VM.

For us to begin with, PyTorch should be installed.
TorchVision is also required since we will be using it as our model zoo.

A quick solution is to install via pip


```shell
pip install torch==1.7.1
pip install torchvision==0.8.2
```

or please refer to official site
https://pytorch.org/get-started/locally/

PyTorch versions should be backwards compatible but should be used
with the proper TorchVision version.

Currently, Only test `TVM` with PyTorch 1.7. Other versions may be unstable.

---

Copyright © Most of the codes is copied from the [TVM tutorial](https://tvm.apache.org/docs/tutorials/frontend/deploy_object_detection_pytorch.html#sphx-glr-tutorials-frontend-deploy-object-detection-pytorch-py).

In [1]:
import tvm
from tvm import relay
from tvm.runtime.vm import VirtualMachine

import numpy as np
import cv2

# PyTorch imports
import torch
from torch import nn
import torchvision

## Load pre-trained `yolov5s` from yolort and do tracing

In [2]:
in_size = 416
input_shape = (in_size, in_size)

In [3]:
from yolort.models import yolov5s
from yolort.relaying import get_trace_module

In [4]:
model_func = yolov5s(pretrained=True)
script_module = get_trace_module(model_func, input_shape=input_shape)

  stride = torch.as_tensor([stride], dtype=dtype, device=device)
  anchor_grid = torch.as_tensor(anchor_grid, dtype=dtype, device=device)
  shifts = shifts - torch.tensor(0.5, dtype=shifts.dtype, device=device)
  for idx in range(batch_size):  # image idx, image inference
  for s, s_orig in zip(new_size, original_size)
  for s, s_orig in zip(new_size, original_size)


Or load with following

```python
model = torch.hub.load('zhiqwang/yolov5-rt-stack', 'yolov5s', pretrained=True)
```

In [5]:
script_module.graph

graph(%self.1 : __torch__.yolort.relaying.trace_wrapper.TraceWrapper,
      %images : Float(1:519168, 3:173056, 416:416, 416:1, requires_grad=0, device=cpu)):
  %4399 : __torch__.yolort.models.yolo_module.YOLOModule = prim::GetAttr[name="model"](%self.1)
  %4778 : (Tensor, Tensor, Tensor) = prim::CallMethod[name="forward"](%4399, %images)
  %4775 : Float(14:4, 4:1, requires_grad=0, device=cpu), %4776 : Float(14:1, requires_grad=0, device=cpu), %4777 : Long(14:1, requires_grad=0, device=cpu) = prim::TupleUnpack(%4778)
  %3515 : (Float(14:4, 4:1, requires_grad=0, device=cpu), Float(14:1, requires_grad=0, device=cpu), Long(14:1, requires_grad=0, device=cpu)) = prim::TupleConstruct(%4775, %4776, %4777)
  return (%3515)

## Download a test image and pre-process

In [6]:
from yolort.utils import get_image_from_url

img = get_image_from_url("https://gitee.com/zhiqwang/yolov5-rt-stack/raw/master/test/assets/bus.jpg")
# img = cv2.imread('../test/assets/bus.jpg')

img = img.astype("float32")
img = cv2.resize(img, (in_size, in_size))

img = np.transpose(img / 255.0, [2, 0, 1])
img = np.expand_dims(img, axis=0)

## Import the graph to Relay

In [7]:
input_name = "input0"
shape_list = [(input_name, (1, 3, *input_shape))]
mod, params = relay.frontend.from_pytorch(script_module, shape_list)



## Compile with Relay VM

Note: Currently only CPU target is supported. For x86 target, it is
highly recommended to build TVM with Intel MKL and Intel OpenMP to get
best performance, due to the existence of large dense operator in
torchvision rcnn models.

In [8]:
# Add "-libs=mkl" to get best performance on x86 target.
# For x86 machine supports AVX512, the complete target is
# "llvm -mcpu=skylake-avx512 -libs=mkl"
target = "llvm"

with tvm.transform.PassContext(opt_level=3, disabled_pass=["FoldScaleAxis"]):
    vm_exec = relay.vm.compile(mod, target=target, params=params)



## Inference with Relay VM

In [9]:
ctx = tvm.cpu()
vm = VirtualMachine(vm_exec, ctx)
vm.set_input("main", **{input_name: img})
tvm_res = vm.run()

In [10]:
%%timeit
vm.set_input("main", **{input_name: img})
tvm_res = vm.run()

88.6 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


## Get boxes with score larger than 0.6

In [11]:
score_threshold = 0.6
boxes = tvm_res[0].asnumpy().tolist()
valid_boxes = []
for i, score in enumerate(tvm_res[1].asnumpy().tolist()):
    if score > score_threshold:
        valid_boxes.append(boxes[i])
    else:
        break

print(f"Get {len(valid_boxes)} valid boxes")

Get 3 valid boxes


## Varify the Inference Output on TVM backend

In [12]:
with torch.no_grad():
    torch_res = script_module(torch.from_numpy(img))

In [13]:
for i in range(len(torch_res)):
    torch.testing.assert_allclose(torch_res[i], tvm_res[i].asnumpy(), rtol=1e-03, atol=1e-05)

print("Exported model has been tested with TVM Runtime, and the result looks good!")

Exported model has been tested with TVM Runtime, and the result looks good!
