You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Those functions replaced by qnn, I did not change anything else, the model can be successfully trained but seemed the running time with GPU and CPU is actually slower than pytorch nn implementation. Did I do anything wrong? Should the model have the speed up the training and inference about 4x times?
The text was updated successfully, but these errors were encountered:
ymli39
changed the title
Quantization for speed up
Quantization for speeding up
Mar 5, 2020
Glad yo hear about your good training results. However, Brevitas is a library oriented towards research on quantization-aware (re)training, it doesn't take care of deployment. It's up to the user to export a trained model to some kind of optimized hw+sw backend. Our main open source backend (currently being developed) is FINN, which deploys quantized models as custom dataflow architectures on FPGAs.
The fact that inference is slower than torch.nn is expected, as quantization-aware operations involves exposing a differentiable integer-only datapath on top of floating point, which can be expensive.
You might want to consider moving to Pytorch official quantization tools. They won't be as good in terms of accuracy, but deployment to CPU/GPU is easier.
Hi, thanks for your code and could you help me with the following question? I have incorporated your provided layers to a DenseUNET model, I have:
conv = qnn.QuantConv2d(in_channels=params['num_channels'], out_channels=params['num_filters'],
kernel_size=(
params['kernel_h'], params['kernel_w']),
padding=(padding_h, padding_w),
stride=params['stride_conv'],
weight_quant_type=QuantType.INT,
weight_bit_width=8)
batchnorm = qnn.BatchNorm2dToQuantScaleBias(num_features=params['num_channels'],
weight_quant_type=QuantType.INT,
weight_bit_width=8)
relu = qnn.QuantReLU(quant_type=QuantType.INT, bit_width=8, max_val=6)
sigmoid = qnn.QuantSigmoid(bit_width=8, quant_type=QuantType.INT)
Those functions replaced by qnn, I did not change anything else, the model can be successfully trained but seemed the running time with GPU and CPU is actually slower than pytorch nn implementation. Did I do anything wrong? Should the model have the speed up the training and inference about 4x times?
The text was updated successfully, but these errors were encountered: