You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Thanks for the release of your code. But I have one problem regarding the detail of the implementation.
In quant_block.py, take the following code of ResNet-18 and ResNet-34 for example.
The disable_act_quant is set True for conv2, which disables the quantization of the output of conv2.
class QuantBasicBlock(BaseQuantBlock):
"""
Implementation of Quantized BasicBlock used in ResNet-18 and ResNet-34.
"""
def __init__(self, basic_block: BasicBlock, weight_quant_params: dict = {}, act_quant_params: dict = {}):
super().__init__(act_quant_params)
self.conv1 = QuantModule(basic_block.conv1, weight_quant_params, act_quant_params)
self.conv1.activation_function = basic_block.relu1
self.conv2 = QuantModule(basic_block.conv2, weight_quant_params, act_quant_params, disable_act_quant=True)
# modify the activation function to ReLU
self.activation_function = basic_block.relu2
if basic_block.downsample is None:
self.downsample = None
else:
self.downsample = QuantModule(basic_block.downsample[0], weight_quant_params, act_quant_params,
disable_act_quant=True)
# copying all attributes in original block
self.stride = basic_block.stride
It will cause a boost in accuracy, the following is the result I get use the your code and the same ImageNet dataset you used in the paper.
[1] and [2] denotes the modification I did to the original code.
[1]: quant_block.py→QuantBasicBlock→__init__→self.conv2=QuantModule(... , disable_act_quant=True) self.downsample = QuantModule(basic_block.downsample[0], weight_quant_params, act_quant_params, disable_act_quant=True). Change from True to False;
[2]: quant_block.py→QuantInvertedResidual→__init__→self.conv=nn.Sequential(..., QuantModule(... , disable_act_quant=True), change from True to False
But I do not think it is applicable for most of NPUs, which do quantization of every output of conv layer.
So why not quantize the activation of the last conv layer in a block? Is there any particular reason for this?
Also, for the methods you compared with in your paper, have you checked whether they do the same thing as you do or not?
The text was updated successfully, but these errors were encountered:
Indeed there is some disagreement on how to insert activation quantization node. As you point out, some hardware can use this computation graph (like TensorRT) and some others cannot. However, many works also choose to quantize the input and the weights of a Conv2d layer and do not deal with a shortcut layer. In this case, we believe our method will have lower accuracy. It is not possible to find the same setting for all methods we compared.
Hi
Thanks for your reply.
Indeed, the quantization process depends on the implementation of the NPU hardware.
Another questionm, I am using single 2080Ti GPU to run your code, CUDA out of memory error occurs for the deep networks and large bacth size(64). I have to use small batch size, 32 for mobilenetV2 or even 16 for resnet50.
Do you have the same problem? Any suggestion for that?
Hi,
Thanks for the release of your code. But I have one problem regarding the detail of the implementation.
In quant_block.py, take the following code of ResNet-18 and ResNet-34 for example.
The disable_act_quant is set True for conv2, which disables the quantization of the output of conv2.
It will cause a boost in accuracy, the following is the result I get use the your code and the same ImageNet dataset you used in the paper.
[1] and [2] denotes the modification I did to the original code.
[1]: quant_block.py→QuantBasicBlock→__init__→self.conv2=QuantModule(... , disable_act_quant=True) self.downsample = QuantModule(basic_block.downsample[0], weight_quant_params, act_quant_params, disable_act_quant=True). Change from True to False;
[2]: quant_block.py→QuantInvertedResidual→__init__→self.conv=nn.Sequential(..., QuantModule(... , disable_act_quant=True), change from True to False
But I do not think it is applicable for most of NPUs, which do quantization of every output of conv layer.
So why not quantize the activation of the last conv layer in a block? Is there any particular reason for this?
Also, for the methods you compared with in your paper, have you checked whether they do the same thing as you do or not?
The text was updated successfully, but these errors were encountered: