Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when set ENABLE_BIAS_QUANT = True #28

Closed
dathudeptrai opened this issue Feb 10, 2020 · 5 comments
Closed

Bug when set ENABLE_BIAS_QUANT = True #28

dathudeptrai opened this issue Feb 10, 2020 · 5 comments

Comments

@dathudeptrai
Copy link

dathudeptrai commented Feb 10, 2020

Hi @volcacius , thanks for ur implementation. I got a bug when set ENABLE_BIAS_QUANT = True. Can u help me debug ?.

/usr/local/lib/python3.6/dist-packages/brevitas/nn/quant_conv.py in forward(self, input)
    224 
    225         if self.compute_output_bit_width:
--> 226             assert input_bit_width is not None
    227             output_bit_width = self.max_output_bit_width(input_bit_width, quant_weight_bit_width)
    228         if self.compute_output_scale:

AssertionError: 
@volcacius
Copy link
Contributor

Hello,
You need to pass a QuantTensor to the QuantConv2d layer, which happens when you enable return_quant_tensor=True in the activation before that QuantConv2d layer. For the very first conv layer, you should insert a quantized identity (a quantized hard tanh) right at the beginning of the network.

As a side note, I see you are working on a quantized MelGAN implementation. GANs can be quite tricky to quantize. We have a working 8 bit version internally that we are going to release at some point in the next couple of months.

Alessandro

@dathudeptrai
Copy link
Author

dathudeptrai commented Feb 10, 2020

@volcacius, yeah, i see. I was successful to quantize melgan to float16 on tflite, it's run 2x faster than realtime. On 8 bit, the accuracy drop much :)), there are many white noise :v. i'm still investigate ur implementation and tflite implementation. Somehow the output of tflite and your's framework is different on 8bit :D. (32bit and 16bit is almost same). If u know the difference of ur quantize and tflite procedure, pls let me know :'(. I thought it was because of the bias when i don't use fake-quantize aware for it, but remove bias didn't solved the problem :D

@volcacius
Copy link
Contributor

It really depends on how you are setting up the quantized layers.
In general TFLite is a great tool for production-oriented quantization, while Brevitas is oriented towards research, which is why it provides many more options.
For what it's worth, our internal results at 8 bit with MelGAN on LJSpeech are on par with floating point quality. I'll be happy to share the details once the model is release.

@dathudeptrai
Copy link
Author

dathudeptrai commented Feb 10, 2020

@volcacius Looking forward to ur model :D . Noticed that my result on LJspeech using 8bit (base on this framework) are on par with float32 too (use pytorch) but have some difference when convert to tflite :D. BTW, thanks for ur great implementation again :D.

@volcacius
Copy link
Contributor

I see now, glad to hear about your good results and thanks for the positive feedback! Please cite us if you plan to release/publish them somewhere, I would really appreciate.
On the export side, unfortunately there aren't any plans for a TFLite compatible flow at the moment. We are working on a custom ONNX based flow, but it's going to target deployment on our own FPGAs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants