-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange jump in training loss and accuracy #30
Comments
@kleinash The margin in marginal loss would become more strict with every epoch of training, thus the loss function would increase. (for instance, a vector has length 0.7 create zero error when the margin for positive sample is 0.6, but it will generate error when the margin is 0.8). I hope this explains the "steps" you saw in the training loss. Also, refers to #31 because we turned off weight decay. |
Thank you. I will have to review your answer a few more times to try and wrap my head around it Would you mind going into that a little more for me? I don't really understand your answer. |
@kleinash Sorry for a unclear explanation. Your presistence is much appreciated. If you take a look at the equation (5) of Matrix Capsule paper, notice Let's plug in some numbers: a_t = 0.9, a_i = 0.1 for all wrong classes (10 classes in total, 9 wrong classes), m = 0.2. In this case, the spread loss for each i is (max(0, 0.2-(0.9-0.1)))^2=0, the total loss is the sum of spread loss for each wrong class, thus 0. This case, the network is good enough. If m = 0.9 instead, the spread loss for each i is (max(0, 0.9-(0.9-0.1)))^2=0.01, the total loss is 0.09, thus with the increasement of margin, the network needs to enlarge the spread between the correct class and all wrong classes. When m increases at each epoch, the network won't change immediately. But some samples that has zero loss in the previous epoch now has positive loss, thus cause loss function to jump a little bit. But finally the loss function would drop. |
Ok! thank you. I think its eq 3 you are referring to? |
Hi. Has anyone had this type of behavior during training? I have just cloned and ran without changing anything.
EDIT: This is what it looks like when it has completely finished training.
The text was updated successfully, but these errors were encountered: