Skip to content
Benchmark of TVM quantized model on CUDA
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
TensorRT Update tensorrt bench Apr 28, 2019
image Update README May 1, 2019
logs [Log] Update model name in autotvm logs Apr 28, 2019
model Initial commit Apr 28, 2019
onnx_model Initial commit Apr 28, 2019
.gitignore
README.md Update README May 1, 2019
__init__.py Initial commit Apr 28, 2019
common.py Initial commit Apr 28, 2019
run_mxnet.py Update run_mxnet.py May 22, 2019
run_tensorrt.py Initial commit Apr 28, 2019
run_tvm.py Initial commit Apr 28, 2019

README.md

Benchmark of TVM quantized model on CUDA

This repository contains benchmark code of int8 inference speed of TVM for the blog post Automating Optimization of Quantized Deep Learning Models on CUDA. The benchmark of MXNet and TensorRT is provided as baseline.

How to Run

TVM

The benchmark is conducted using tvm@e22b58. LLVM and CUDA need to be enabled. Compute Capability 6.1 CUDA device is required to support the dp4a instruction.

We only provide auto-tuning logs on NVIDIA GTX 1080. To run on other devices, you can follow the AutoTVM tutorial to run auto-tuning.

python3 run_tvm.py --log_file logs/history_best_1080.log

MXNet

MXNet 1.4, cuDNN 7.3+ are required.

python3 run_mxnet.py

TensorRT

TensorRT 5 is required. We use onnx models as input. The onnx models will be generated from MXNet when running the benchmark script.

cd TensorRT; make; cd -;
python3 run_tensorrt.py

Result

benchmark results

You can’t perform that action at this time.