dorefa'ing cifar-resnet #69

a-maci · 2016-12-18T06:26:43Z

Adding DoReFa to cifar10-resnet.

Not sure where I should add the fg(.) commands. Left these in the comments (see lines 85, 87).
Does reducing the precision of pooling layer makes sense (line 90)? For now left this as a comment.
Could you please check for correctness?

I will upload the convergence curves along with these edits if things look OK to you.
I will do these for imagenet-resnet and send you a pull request as well.

Thanks

ppwwyyxx · 2016-12-18T07:04:52Z

Hi, thanks for your PR!

Your implementation, with BITA=32 and no fg, is not DoReFa-Net. It's just BWN (Binary Weight Network). It's still good if we can have a BWN example, though.
We don't include examples without a meaningful performance number (except for examples which illustrate a different type of problem), otherwise this would be just a combination of code snippets from other examples, which doesn't really teach others anything new.
By "meaningful", it should either be a state-of-the-art result, or a reproduced result of some paper. If you could reproduce the Cifar performance in BWN (binarize weights only), or BNN (binarize both weights and activations), this would make it a good example.
Same for ImageNet. We currently have a ResNet-18 model on ImageNet with 1 bit weights, 4 bit activations and 60% accuracy and it might get released soon. You can contribute if you have a similar performance with ResNet-18.
About the architecture, we chose the other ResNet variant which uses convolution to increase channels instead of pooling, so we don't have such problems.

a-maci · 2016-12-18T07:37:41Z

Thanks.
I will work on (1) and (3).

Question: How should I include fg(.) and fa(.) in the implementation? E.g. is the below snippet correct (based on cifar10-resnet)?

Ex:
c1 = Conv2D('conv1', b1, out_channel, stride=stride1, nl=BNReLU)
c1 = activate(c1) #quantizing activations of conv1
c1 = fg(c1) #quantizing gradients for conv1 layer
c2 = Conv2D('conv2', c1, out_channel)
c2 = fg(c2) #quantizing the gradients for conv2 layer, not quantizing the activations for this layer

For alexnet-dorefa I see this for example:
.Conv2D('conv2', 384, 3)
.apply(fg)
.BatchNorm('bn2')
.MaxPooling('pool2', 3, 2, padding='SAME')
.apply(activate)

I dont get how the activations of conv2 alone (before bn2) are quantized.
Does the above snippet quantize the output after maxpool? I.e. CONV-->BN-->MXPOOL->QUANTIZE ?

I was thinking of quantizing on a per layer basis: CONV->QUANTIZE->BN->QUANTIZE->MXPOOL->QUANTIZE. Is my understanding incorrect?

Thanks!

ppwwyyxx · 2016-12-18T18:19:29Z

The only layers get quantized in DoReFa-Net is Conv and FC.
In general fg is added directly after Conv (without any non-linearity) and fa is added directly before Conv. Then both forward and backward get quantized.

a-maci · 2016-12-19T02:09:45Z

Thanks. You can close this PR. I will update when I have meaningful results/examples.

dorefa'ing cifar-resnet

d76ab3a

a-maci closed this Dec 19, 2016

Abhishek2271 mentioned this pull request Sep 23, 2021

Resnet DoReFaNet implementation #1542

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dorefa'ing cifar-resnet #69

dorefa'ing cifar-resnet #69

a-maci commented Dec 18, 2016

ppwwyyxx commented Dec 18, 2016

a-maci commented Dec 18, 2016

ppwwyyxx commented Dec 18, 2016

a-maci commented Dec 19, 2016

dorefa'ing cifar-resnet #69

dorefa'ing cifar-resnet #69

Conversation

a-maci commented Dec 18, 2016

ppwwyyxx commented Dec 18, 2016

a-maci commented Dec 18, 2016

ppwwyyxx commented Dec 18, 2016

a-maci commented Dec 19, 2016