Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark models that are not easily accessible #5

Merged
merged 4 commits into from
Mar 15, 2022

Conversation

masahi
Copy link
Contributor

@masahi masahi commented Mar 14, 2022

The motivation is that, there has been growing interest in testing and benchmarking int8 BERT. But BERT and other transformer models are hard to quantize or import to TVM properly, so until recently they were not available to us for benchmarking.

Recently I found that NVIDIA FasterTransformer repo has an example of quantizing BERT by PTQ or QAT using TensorRT's pytorch_quantization tool. And thanks to the recent work on apache/tvm#10239, we can now import those "fake-quantized" QAT BERT models into Relay and convert them into a fully-integer model.

Example usages will be provided soon in the tvm repo under python/tvm/meta_schedule/testing/XXX.py. Also see #5 (comment)

I wonder if we need to worry about license issues?

cc @junrushao1994 @areusch @tqchen @comaniac

@masahi masahi changed the title Add benchmark models that are not easily accesible Add benchmark models that are not easily accessible Mar 14, 2022
@areusch
Copy link

areusch commented Mar 14, 2022

cc @driazati as there's some overlap with tlc-pack/ci-data

@comaniac
Copy link

IIUC, you want to add the serialized binary files of these models to this repo. In terms of the license, I'd like to confirm that did you generate these binary files by yourself based on the FasterTransformer, or are these directly cloned from somewhere in that repo?

I checked the FastTransformer repo and it is Apache-2.0 license, so it is fine for us to use any code from that repo. We only need to add one line saying this is modified from FastTransformer repo. In the case of binary files, I think a separate README.md under the same directory also works.

@masahi
Copy link
Contributor Author

masahi commented Mar 14, 2022

Yes I generated both of them myself. I added ugly hack in one of scripts in FasterTransformer to manually export the model to ONNX. A new README is there already, I can add more details on the export process if desired.

I was not aware of ci-data, but these models are not for CI. So I think this repo is better.

@comaniac
Copy link

Got it. Then IMHO we just need to explicitly saying like generated from XXX under Apache-2.0 license in README.

@masahi
Copy link
Contributor Author

masahi commented Mar 14, 2022

Ok, added more details on the export process and licensed under Apache-2.0 blurb. README is here https://github.com/masahi/TLCBench/blob/bench-models/models/README.md

I think it is good to go.

Copy link

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@masahi
Copy link
Contributor Author

masahi commented Mar 15, 2022

An example on how to use quantized BERT (Running requires apache/tvm#10596):

Example output:

One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Evaluate inference time cost with target cuda ... 
Execution time summary:                     
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
  18.4024      18.2218      19.6602      18.1031       0.4598   

One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Evaluate inference time cost with target cuda -libs=cublas ...
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
   9.2887       9.2200       9.7776       9.1559       0.2160   

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants