Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dorefa'ing cifar-resnet #69

Closed
wants to merge 1 commit into from
Closed

dorefa'ing cifar-resnet #69

wants to merge 1 commit into from

Conversation

a-maci
Copy link

@a-maci a-maci commented Dec 18, 2016

Adding DoReFa to cifar10-resnet.

  1. Not sure where I should add the fg(.) commands. Left these in the comments (see lines 85, 87).
  2. Does reducing the precision of pooling layer makes sense (line 90)? For now left this as a comment.
    Could you please check for correctness?

I will upload the convergence curves along with these edits if things look OK to you.
I will do these for imagenet-resnet and send you a pull request as well.

Thanks

@ppwwyyxx
Copy link
Collaborator

Hi, thanks for your PR!

  1. Your implementation, with BITA=32 and no fg, is not DoReFa-Net. It's just BWN (Binary Weight Network). It's still good if we can have a BWN example, though.

  2. We don't include examples without a meaningful performance number (except for examples which illustrate a different type of problem), otherwise this would be just a combination of code snippets from other examples, which doesn't really teach others anything new.
    By "meaningful", it should either be a state-of-the-art result, or a reproduced result of some paper. If you could reproduce the Cifar performance in BWN (binarize weights only), or BNN (binarize both weights and activations), this would make it a good example.

  3. Same for ImageNet. We currently have a ResNet-18 model on ImageNet with 1 bit weights, 4 bit activations and 60% accuracy and it might get released soon. You can contribute if you have a similar performance with ResNet-18.

  4. About the architecture, we chose the other ResNet variant which uses convolution to increase channels instead of pooling, so we don't have such problems.

@a-maci
Copy link
Author

a-maci commented Dec 18, 2016

Thanks.
I will work on (1) and (3).

Question: How should I include fg(.) and fa(.) in the implementation? E.g. is the below snippet correct (based on cifar10-resnet)?

Ex:
c1 = Conv2D('conv1', b1, out_channel, stride=stride1, nl=BNReLU)
c1 = activate(c1) #quantizing activations of conv1
c1 = fg(c1) #quantizing gradients for conv1 layer
c2 = Conv2D('conv2', c1, out_channel)
c2 = fg(c2) #quantizing the gradients for conv2 layer, not quantizing the activations for this layer

For alexnet-dorefa I see this for example:
.Conv2D('conv2', 384, 3)
.apply(fg)
.BatchNorm('bn2')
.MaxPooling('pool2', 3, 2, padding='SAME')
.apply(activate)

I dont get how the activations of conv2 alone (before bn2) are quantized.
Does the above snippet quantize the output after maxpool? I.e. CONV-->BN-->MXPOOL->QUANTIZE ?

I was thinking of quantizing on a per layer basis: CONV->QUANTIZE->BN->QUANTIZE->MXPOOL->QUANTIZE. Is my understanding incorrect?

Thanks!

@ppwwyyxx
Copy link
Collaborator

The only layers get quantized in DoReFa-Net is Conv and FC.
In general fg is added directly after Conv (without any non-linearity) and fa is added directly before Conv. Then both forward and backward get quantized.

@a-maci
Copy link
Author

a-maci commented Dec 19, 2016

Thanks. You can close this PR. I will update when I have meaningful results/examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants