Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantization for speeding up #52

Closed
ymli39 opened this issue Mar 5, 2020 · 1 comment
Closed

Quantization for speeding up #52

ymli39 opened this issue Mar 5, 2020 · 1 comment

Comments

@ymli39
Copy link

ymli39 commented Mar 5, 2020

Hi, thanks for your code and could you help me with the following question? I have incorporated your provided layers to a DenseUNET model, I have:

conv = qnn.QuantConv2d(in_channels=params['num_channels'], out_channels=params['num_filters'],
kernel_size=(
params['kernel_h'], params['kernel_w']),
padding=(padding_h, padding_w),
stride=params['stride_conv'],
weight_quant_type=QuantType.INT,
weight_bit_width=8)

batchnorm = qnn.BatchNorm2dToQuantScaleBias(num_features=params['num_channels'],
weight_quant_type=QuantType.INT,
weight_bit_width=8)

relu = qnn.QuantReLU(quant_type=QuantType.INT, bit_width=8, max_val=6)

sigmoid = qnn.QuantSigmoid(bit_width=8, quant_type=QuantType.INT)

Those functions replaced by qnn, I did not change anything else, the model can be successfully trained but seemed the running time with GPU and CPU is actually slower than pytorch nn implementation. Did I do anything wrong? Should the model have the speed up the training and inference about 4x times?

@ymli39 ymli39 changed the title Quantization for speed up Quantization for speeding up Mar 5, 2020
@volcacius
Copy link
Contributor

Hello,

Glad yo hear about your good training results. However, Brevitas is a library oriented towards research on quantization-aware (re)training, it doesn't take care of deployment. It's up to the user to export a trained model to some kind of optimized hw+sw backend. Our main open source backend (currently being developed) is FINN, which deploys quantized models as custom dataflow architectures on FPGAs.
The fact that inference is slower than torch.nn is expected, as quantization-aware operations involves exposing a differentiable integer-only datapath on top of floating point, which can be expensive.
You might want to consider moving to Pytorch official quantization tools. They won't be as good in terms of accuracy, but deployment to CPU/GPU is easier.

Alessandro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants