# Compile YOLOv5 Models

This article is an introductory tutorial to deploy PyTorch YOLOv5 models with Relay VM.

For us to begin with, PyTorch should be installed.
TorchVision is also required since we will be using it as our model zoo.

A quick solution is to install via pip


```shell
pip install torch==1.7.1
pip install torchvision==0.8.2
```

or please refer to official site
https://pytorch.org/get-started/locally/

PyTorch versions should be backwards compatible but should be used
with the proper TorchVision version.

Currently, Only test `TVM` with PyTorch 1.7. Other versions may be unstable.

---

Copyright © Most of the codes is copied from the [TVM tutorial](https://tvm.apache.org/docs/tutorials/frontend/deploy_object_detection_pytorch.html#sphx-glr-tutorials-frontend-deploy-object-detection-pytorch-py).

In [1]:
import tvm
from tvm import relay
from tvm.runtime.vm import VirtualMachine

import numpy as np
import cv2

# PyTorch imports
import torch
from torch import nn
import torchvision

## Load pre-trained `yolov5s` from yolort and do tracing

In [2]:
in_size = 416

input_shape = (1, 3, in_size, in_size)


def do_trace(model, inp):
    model_trace = torch.jit.trace(model, inp)
    model_trace.eval()
    return model_trace


def dict_to_tuple(out_dict):
    if "masks" in out_dict.keys():
        return out_dict["boxes"], out_dict["scores"], out_dict["labels"], out_dict["masks"]
    return out_dict["boxes"], out_dict["scores"], out_dict["labels"]


class TraceWrapper(nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model = model

    def forward(self, inp):
        out = self.model(inp)
        return dict_to_tuple(out[0])

In [3]:
from yolort.models import yolov5s

model_func = yolov5s(upstream_version='v4.0', export_friendly=True, pretrained=True)

In [4]:
# Or load with following
# model_func = torch.hub.load('zhiqwang/yolov5-rt-stack', 'yolov5s', pretrained=True)

In [5]:
model = TraceWrapper(model_func)

model.eval()
inp = torch.Tensor(np.random.uniform(0.0, 250.0, size=(1, 3, in_size, in_size)))

with torch.no_grad():
    out = model(inp)
    script_module = do_trace(model, inp)

  dtype=torch.float32)).float())) for i in range(dim)]
  stride = torch.as_tensor([stride], dtype=dtype, device=device)
  anchor_grid = torch.as_tensor(anchor_grid, dtype=dtype, device=device)
  shifts = shifts - torch.tensor(0.5, dtype=shifts.dtype, device=device)
  for idx in range(batch_size):  # image idx, image inference
  for s, s_orig in zip(new_size, original_size)
  for s, s_orig in zip(new_size, original_size)


In [6]:
script_module.graph

graph(%self.1 : __torch__.TraceWrapper,
      %images : Float(1:519168, 3:173056, 416:416, 416:1, requires_grad=0, device=cpu)):
  %4495 : __torch__.yolort.models.yolo_module.YOLOModule = prim::GetAttr[name="model"](%self.1)
  %4874 : (Tensor, Tensor, Tensor) = prim::CallMethod[name="forward"](%4495, %images)
  %4871 : Float(300:4, 4:1, requires_grad=0, device=cpu), %4872 : Float(300:1, requires_grad=0, device=cpu), %4873 : Long(300:1, requires_grad=0, device=cpu) = prim::TupleUnpack(%4874)
  %3611 : (Float(300:4, 4:1, requires_grad=0, device=cpu), Float(300:1, requires_grad=0, device=cpu), Long(300:1, requires_grad=0, device=cpu)) = prim::TupleConstruct(%4871, %4872, %4873)
  return (%3611)

## Download a test image and pre-process

In [7]:
img_path = './test/assets/bus.jpg'

img = cv2.imread(img_path).astype("float32")
img = cv2.resize(img, (in_size, in_size))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = np.transpose(img / 255.0, [2, 0, 1])
img = np.expand_dims(img, axis=0)

## Import the graph to Relay

In [8]:
input_name = "input0"
shape_list = [(input_name, input_shape)]
mod, params = relay.frontend.from_pytorch(script_module, shape_list)



## Compile with Relay VM

Note: Currently only CPU target is supported. For x86 target, it is
highly recommended to build TVM with Intel MKL and Intel OpenMP to get
best performance, due to the existence of large dense operator in
torchvision rcnn models.

In [9]:
# Add "-libs=mkl" to get best performance on x86 target.
# For x86 machine supports AVX512, the complete target is
# "llvm -mcpu=skylake-avx512 -libs=mkl"
target = "llvm"

with tvm.transform.PassContext(opt_level=3, disabled_pass=["FoldScaleAxis"]):
    vm_exec = relay.vm.compile(mod, target=target, params=params)



## Inference with Relay VM

In [10]:
ctx = tvm.cpu()
vm = VirtualMachine(vm_exec, ctx)
vm.set_input("main", **{input_name: img})
tvm_res = vm.run()

In [11]:
%%time
vm.set_input("main", **{input_name: img})
tvm_res = vm.run()

CPU times: user 684 ms, sys: 832 ms, total: 1.52 s
Wall time: 39.2 ms


## Get boxes with score larger than 0.6

In [12]:
score_threshold = 0.6
boxes = tvm_res[0].asnumpy().tolist()
valid_boxes = []
for i, score in enumerate(tvm_res[1].asnumpy().tolist()):
    if score > score_threshold:
        valid_boxes.append(boxes[i])
    else:
        break

print("Get {} valid boxes".format(len(valid_boxes)))

Get 4 valid boxes
