Add benchmark models that are not easily accessible #5

masahi · 2022-03-14T20:53:41Z

The motivation is that, there has been growing interest in testing and benchmarking int8 BERT. But BERT and other transformer models are hard to quantize or import to TVM properly, so until recently they were not available to us for benchmarking.

Recently I found that NVIDIA FasterTransformer repo has an example of quantizing BERT by PTQ or QAT using TensorRT's pytorch_quantization tool. And thanks to the recent work on apache/tvm#10239, we can now import those "fake-quantized" QAT BERT models into Relay and convert them into a fully-integer model.

Example usages will be provided soon in the tvm repo under python/tvm/meta_schedule/testing/XXX.py. Also see #5 (comment)

I wonder if we need to worry about license issues?

cc @junrushao1994 @areusch @tqchen @comaniac

areusch · 2022-03-14T21:16:12Z

cc @driazati as there's some overlap with tlc-pack/ci-data

comaniac · 2022-03-14T21:18:16Z

IIUC, you want to add the serialized binary files of these models to this repo. In terms of the license, I'd like to confirm that did you generate these binary files by yourself based on the FasterTransformer, or are these directly cloned from somewhere in that repo?

I checked the FastTransformer repo and it is Apache-2.0 license, so it is fine for us to use any code from that repo. We only need to add one line saying this is modified from FastTransformer repo. In the case of binary files, I think a separate README.md under the same directory also works.

masahi · 2022-03-14T21:24:10Z

Yes I generated both of them myself. I added ugly hack in one of scripts in FasterTransformer to manually export the model to ONNX. A new README is there already, I can add more details on the export process if desired.

I was not aware of ci-data, but these models are not for CI. So I think this repo is better.

comaniac · 2022-03-14T21:28:59Z

Got it. Then IMHO we just need to explicitly saying like generated from XXX under Apache-2.0 license in README.

masahi · 2022-03-14T22:12:50Z

Ok, added more details on the export process and licensed under Apache-2.0 blurb. README is here https://github.com/masahi/TLCBench/blob/bench-models/models/README.md

I think it is good to go.

comaniac

LGTM

masahi · 2022-03-15T00:14:56Z

An example on how to use quantized BERT (Running requires apache/tvm#10596):

Import https://gist.github.com/masahi/fc5640cf695a21e765d030e1b9f3fec9. We can tweak the batch size / seq len to make the workload heavier / lighter https://gist.github.com/masahi/fc5640cf695a21e765d030e1b9f3fec9#file-qat_bert_import-py-L10-L11
Compile and run on autotvm or cublas tensorcore: https://gist.github.com/masahi/136e86bc813754de67f35ffb86c1fedd

Example output:

One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Evaluate inference time cost with target cuda ... 
Execution time summary:                     
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
  18.4024      18.2218      19.6602      18.1031       0.4598   

One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Evaluate inference time cost with target cuda -libs=cublas ...
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
   9.2887       9.2200       9.7776       9.1559       0.2160

masahi added 3 commits March 15, 2022 05:16

Add QAT BERT model

9a9dc52

Add Efficientnet v2 model

5f4c278

Add README for model dir

6573673

masahi changed the title ~~Add benchmark models that are not easily accesible~~ Add benchmark models that are not easily accessible Mar 14, 2022

Add more details on export and talk about license stuff

cbee297

comaniac approved these changes Mar 14, 2022

View reviewed changes

masahi merged commit 218ad10 into tlc-pack:main Mar 15, 2022

masahi mentioned this pull request Mar 15, 2022

[Testing] Add model loader for int8 BERT apache/tvm#10622

Merged

masahi mentioned this pull request May 16, 2022

Cannot download benchmark models anymore due to GH storage limit #6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark models that are not easily accessible #5

Add benchmark models that are not easily accessible #5

masahi commented Mar 14, 2022 •

edited

Loading

areusch commented Mar 14, 2022

comaniac commented Mar 14, 2022

masahi commented Mar 14, 2022 •

edited

Loading

comaniac commented Mar 14, 2022

masahi commented Mar 14, 2022

comaniac left a comment

masahi commented Mar 15, 2022 •

edited

Loading

Add benchmark models that are not easily accessible #5

Add benchmark models that are not easily accessible #5

Conversation

masahi commented Mar 14, 2022 • edited Loading

areusch commented Mar 14, 2022

comaniac commented Mar 14, 2022

masahi commented Mar 14, 2022 • edited Loading

comaniac commented Mar 14, 2022

masahi commented Mar 14, 2022

comaniac left a comment

Choose a reason for hiding this comment

masahi commented Mar 15, 2022 • edited Loading

masahi commented Mar 14, 2022 •

edited

Loading

masahi commented Mar 14, 2022 •

edited

Loading

masahi commented Mar 15, 2022 •

edited

Loading