Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why not quantize the activation of the last conv layer in a block #20

Closed
frankgt opened this issue Sep 27, 2021 · 3 comments
Closed

why not quantize the activation of the last conv layer in a block #20

frankgt opened this issue Sep 27, 2021 · 3 comments

Comments

@frankgt
Copy link

frankgt commented Sep 27, 2021

Hi,
Thanks for the release of your code. But I have one problem regarding the detail of the implementation.
In quant_block.py, take the following code of ResNet-18 and ResNet-34 for example.
The disable_act_quant is set True for conv2, which disables the quantization of the output of conv2.

class QuantBasicBlock(BaseQuantBlock):
    """
    Implementation of Quantized BasicBlock used in ResNet-18 and ResNet-34.
    """
    def __init__(self, basic_block: BasicBlock, weight_quant_params: dict = {}, act_quant_params: dict = {}):
        super().__init__(act_quant_params)
        self.conv1 = QuantModule(basic_block.conv1, weight_quant_params, act_quant_params)
        self.conv1.activation_function = basic_block.relu1
        self.conv2 = QuantModule(basic_block.conv2, weight_quant_params, act_quant_params, disable_act_quant=True)

        # modify the activation function to ReLU
        self.activation_function = basic_block.relu2

        if basic_block.downsample is None:
            self.downsample = None
        else:
            self.downsample = QuantModule(basic_block.downsample[0], weight_quant_params, act_quant_params,
                                          disable_act_quant=True)
        # copying all attributes in original block
        self.stride = basic_block.stride

It will cause a boost in accuracy, the following is the result I get use the your code and the same ImageNet dataset you used in the paper.
[1] and [2] denotes the modification I did to the original code.

image

[1]: quant_block.py→QuantBasicBlock→__init__→self.conv2=QuantModule(... , disable_act_quant=True) self.downsample = QuantModule(basic_block.downsample[0], weight_quant_params, act_quant_params, disable_act_quant=True). Change from True to False;
[2]: quant_block.py→QuantInvertedResidual→__init__→self.conv=nn.Sequential(..., QuantModule(... , disable_act_quant=True), change from True to False

But I do not think it is applicable for most of NPUs, which do quantization of every output of conv layer.
So why not quantize the activation of the last conv layer in a block? Is there any particular reason for this?
Also, for the methods you compared with in your paper, have you checked whether they do the same thing as you do or not?

@yhhhli
Copy link
Owner

yhhhli commented Sep 28, 2021

Hi thanks for your comment,

Indeed there is some disagreement on how to insert activation quantization node. As you point out, some hardware can use this computation graph (like TensorRT) and some others cannot. However, many works also choose to quantize the input and the weights of a Conv2d layer and do not deal with a shortcut layer. In this case, we believe our method will have lower accuracy. It is not possible to find the same setting for all methods we compared.

@frankgt
Copy link
Author

frankgt commented Sep 29, 2021

Hi
Thanks for your reply.
Indeed, the quantization process depends on the implementation of the NPU hardware.
Another questionm, I am using single 2080Ti GPU to run your code, CUDA out of memory error occurs for the deep networks and large bacth size(64). I have to use small batch size, 32 for mobilenetV2 or even 16 for resnet50.
Do you have the same problem? Any suggestion for that?

@yhhhli
Copy link
Owner

yhhhli commented Sep 29, 2021

Good question,

in fact the GPU memory is mainly cost by storing the input & output for block. So there are 2 ways to deal with that:

  1. Use multi-GPU reconstruction, we release the code of this using torch.distributed.
  2. Reduce the size of calibration set, for example 512 or 768

@frankgt frankgt closed this as completed Sep 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants