Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange jump in training loss and accuracy #30

Closed
ashleylid opened this issue May 11, 2018 · 4 comments
Closed

Strange jump in training loss and accuracy #30

ashleylid opened this issue May 11, 2018 · 4 comments

Comments

@ashleylid
Copy link

ashleylid commented May 11, 2018

Hi. Has anyone had this type of behavior during training? I have just cloned and ran without changing anything.

capture1

capture

EDIT: This is what it looks like when it has completely finished training.

unreal3_96x96_1

@yhyu13
Copy link
Collaborator

yhyu13 commented May 14, 2018

@kleinash

The margin in marginal loss would become more strict with every epoch of training, thus the loss function would increase. (for instance, a vector has length 0.7 create zero error when the margin for positive sample is 0.6, but it will generate error when the margin is 0.8). I hope this explains the "steps" you saw in the training loss. Also, refers to #31 because we turned off weight decay.

@ashleylid
Copy link
Author

ashleylid commented May 14, 2018

Thank you. I will have to review your answer a few more times to try and wrap my head around it

Would you mind going into that a little more for me? I don't really understand your answer.

@ashleylid ashleylid reopened this May 14, 2018
@yhyu13
Copy link
Collaborator

yhyu13 commented May 15, 2018

@kleinash

Sorry for a unclear explanation. Your presistence is much appreciated.

If you take a look at the equation (5) of Matrix Capsule paper, notice a_t means the 2-norm of the capsule that represent the correct class, a_i means the 2-norm of a capsule that represent the wrong class.

Let's plug in some numbers: a_t = 0.9, a_i = 0.1 for all wrong classes (10 classes in total, 9 wrong classes), m = 0.2. In this case, the spread loss for each i is (max(0, 0.2-(0.9-0.1)))^2=0, the total loss is the sum of spread loss for each wrong class, thus 0. This case, the network is good enough.

If m = 0.9 instead, the spread loss for each i is (max(0, 0.9-(0.9-0.1)))^2=0.01, the total loss is 0.09, thus with the increasement of margin, the network needs to enlarge the spread between the correct class and all wrong classes.

When m increases at each epoch, the network won't change immediately. But some samples that has zero loss in the previous epoch now has positive loss, thus cause loss function to jump a little bit. But finally the loss function would drop.

@ashleylid
Copy link
Author

Ok! thank you. I think its eq 3 you are referring to?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants