Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Accuracy Stagnates #4

Closed
jain-avi opened this issue Jun 15, 2018 · 7 comments
Closed

Test Accuracy Stagnates #4

jain-avi opened this issue Jun 15, 2018 · 7 comments

Comments

@jain-avi
Copy link

Can you tell me if your training and testing accuracies always followed each other? I am implementing a smaller and modified version of the network you coded, and my test accuracy seems to have stagnated at 81%.
Also, I think you have coded a different architecture because you are adding output of pool layer as well as the output of pool+conv layer to the upsampled input, while the actual architecture only adds the pool+conv output to the upsampled layer. Is that making all the difference?

@tengshaofeng
Copy link
Owner

@Neo96Mav , which network you used, or you have modified the network yourself based my code?

@jain-avi
Copy link
Author

I have used your network and the official Caffe network for reference, and implemented my own small network. I am not using attention modules for 4x4 because I feel they are too small, and I am only using one attention module in 8x8. My network is relatively small, and its for CIFAR images only.
Can you let me know the intuition behind this -
screen shot 2018-06-15 at 5 03 51 pm
You have added output of residual block, as well as the output of the skip connection to the upsampled layer!

@tengshaofeng
Copy link
Owner

@Neo96Mav , this is refer to the caffe network, i think it is added for more detail information. You can remove it for testing the effectiveness.

@josianerodrigues
Copy link

Hi @Neo96Mav,
Did you test the model using only one 8x8 Attention module? Was the accuracy better?

@jain-avi
Copy link
Author

jain-avi commented Jul 5, 2018

Hi @josianerodrigues , I added the 4x4 attention module as well. I am stuck at 89.5% accuracy. Maybe my model is not big enough or I am not using the exact same configuration, but I feel that it should not have affected it so much. @tengshaofeng Do u have any ideas why we can't match the authors performance?

@tengshaofeng
Copy link
Owner

@Neo96Mav , the paper only give the archietcture details of attention_92 for imagenet with 224 input but not for cifar10. So I build the net ResidualAttentionModel_92_32input following my understanding.
I have tested on it on cifar10 test set, the result is as following:
Accuracy of the model on the test images: 0.9354

maybe some details is not good. you can refer to the data preprocessed in the paper, keep same with the author. or maybe you can tune the hyper parameters for better performance. U can also remove the add operation to test the network.
image

@tengshaofeng
Copy link
Owner

@Neo96Mav @josianerodrigues
the result now is 0.954

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants