Strange jump in training loss and accuracy #30

ashleylid · 2018-05-11T14:22:25Z

Hi. Has anyone had this type of behavior during training? I have just cloned and ran without changing anything.

EDIT: This is what it looks like when it has completely finished training.

yhyu13 · 2018-05-14T16:00:35Z

@kleinash

The margin in marginal loss would become more strict with every epoch of training, thus the loss function would increase. (for instance, a vector has length 0.7 create zero error when the margin for positive sample is 0.6, but it will generate error when the margin is 0.8). I hope this explains the "steps" you saw in the training loss. Also, refers to #31 because we turned off weight decay.

ashleylid · 2018-05-14T18:08:18Z

Thank you. I will have to review your answer a few more times to try and wrap my head around it

Would you mind going into that a little more for me? I don't really understand your answer.

yhyu13 · 2018-05-15T21:17:00Z

@kleinash

Sorry for a unclear explanation. Your presistence is much appreciated.

If you take a look at the equation (5) of Matrix Capsule paper, notice a_t means the 2-norm of the capsule that represent the correct class, a_i means the 2-norm of a capsule that represent the wrong class.

Let's plug in some numbers: a_t = 0.9, a_i = 0.1 for all wrong classes (10 classes in total, 9 wrong classes), m = 0.2. In this case, the spread loss for each i is (max(0, 0.2-(0.9-0.1)))^2=0, the total loss is the sum of spread loss for each wrong class, thus 0. This case, the network is good enough.

If m = 0.9 instead, the spread loss for each i is (max(0, 0.9-(0.9-0.1)))^2=0.01, the total loss is 0.09, thus with the increasement of margin, the network needs to enlarge the spread between the correct class and all wrong classes.

When m increases at each epoch, the network won't change immediately. But some samples that has zero loss in the previous epoch now has positive loss, thus cause loss function to jump a little bit. But finally the loss function would drop.

ashleylid · 2018-05-16T13:06:33Z

Ok! thank you. I think its eq 3 you are referring to?

ashleylid mentioned this issue May 14, 2018

Test on smallNORM with parameters specified in the paper have very bad result #24

Open

ashleylid closed this as completed May 14, 2018

ashleylid reopened this May 14, 2018

ashleylid closed this as completed May 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange jump in training loss and accuracy #30

Strange jump in training loss and accuracy #30

ashleylid commented May 11, 2018 •

edited

Loading

yhyu13 commented May 14, 2018

ashleylid commented May 14, 2018 •

edited

Loading

yhyu13 commented May 15, 2018 •

edited

Loading

ashleylid commented May 16, 2018

Strange jump in training loss and accuracy #30

Strange jump in training loss and accuracy #30

Comments

ashleylid commented May 11, 2018 • edited Loading

yhyu13 commented May 14, 2018

ashleylid commented May 14, 2018 • edited Loading

yhyu13 commented May 15, 2018 • edited Loading

ashleylid commented May 16, 2018

ashleylid commented May 11, 2018 •

edited

Loading

ashleylid commented May 14, 2018 •

edited

Loading

yhyu13 commented May 15, 2018 •

edited

Loading