Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shufflenet as backbone #67

Closed
YellowKyu opened this issue Feb 11, 2019 · 4 comments
Closed

Shufflenet as backbone #67

YellowKyu opened this issue Feb 11, 2019 · 4 comments

Comments

@YellowKyu
Copy link

Hey guys,

Anyone tried to replace the backbone by something like a Shufflenet or Mobilenet ?
Since the Xception model is not released maybe it could be a good alternative to improve the inference speed !
I'm trying to add the architecture.py from https://github.com/TropComplique/shufflenet-v2-tensorflow to network_desp.py but during the training the rpn_cls_loss seems to be switching between 0.5, 0.6, 0.7, 0.8 and 0.9 without decreasing further....

Thanks for your help !

@karansomaiah
Copy link

Hey @YellowKyu
I did try the mobilenet_v1 but I experienced nan loss very early in the training stage. I tried to reduce the learning rate but it didn't help. I also tried experimenting with the Xception like network as mentioned in the paper, faced the same issue though. Let me know if you find something.

  • Karan

@YellowKyu
Copy link
Author

@karansomaiah Hi there ! which feature maps are you feeding to the RPN and the large separable convolution ? I have high loss (like around 10~15) with the Shufflenet but not NaN ...

@karansomaiah
Copy link

Have you solved it? @YellowKyu

These are the blocks:

blocks = [
    resnet_utils.Block('block1', bottleneck,
                               [(144, 24, 2, 1)] + [(144, 24, 1, 1)] * 3),
    resnet_utils.Block('block2', bottleneck,
                               [(288, 144, 2, 1)] + [(288, 144, 1, 1)] * 7),
    resnet_utils.Block('block3', bottleneck,
                               [(576, 288, 1, 1)] + [(576, 288, 1, 1)] * 3)
]

And I was passing block2 features to the RPN
Also, digging into the PSAlign code, I feel the loss is high because of the hard coded spatial scale for resnet101 in the original code. Appropriate scaling for the reduced size of the feature maps will fix the issue.

@YellowKyu
Copy link
Author

hi @karansomaiah ,

For mobilenet, I fed Conv8_pointwise to the RPN and Conv11_pointwise to the large separable conv and it converged nicely.
For Shufflenet, I also succeed to make it converge but I only used Stage3 for both RPN and large separable conv. I noticed that it is related to the resolution of my features maps, which is also related to what you discovered with PSAlign. Did you try to modify PSAlign ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants