a problem on routing iteration #10

rtz19970824 · 2017-12-01T13:18:50Z

Hi,

If the routing iteration is greater than 1, the loss will become nan. What can I do to modify it?

rtz19970824 · 2017-12-02T13:44:44Z

I found that the NaN is caused by small sigma, which led to a large likelihood. You may try to calculate log(P) instead p and calculate r by tf.reduce_logsumexp @www0wwwjs1

www0wwwjs1 · 2017-12-02T15:28:38Z

Thanks a lot for the suggestion!

yhyu13 · 2017-12-03T08:41:49Z

@www0wwwjs1

Is the suggestion resolved?

rtz19970824 · 2017-12-03T13:15:41Z

There's still some problems, I print out the activation of the class capsules layer, it seems that all of the activation is 1., I guess it's caused by a large -log(sigma_h) so the logits are so large. It seems that decrease the temperature(lambda) to 1e-2 will work. I don't know if it is reasonable, what's your idea? @www0wwwjs1 Thanks!

yhyu13 · 2017-12-03T16:30:31Z

@rtz19970824

Could you please mark the particular lines that you're improving? Thanks!

www0wwwjs1 · 2017-12-03T23:57:00Z

\lambda = 0.01 is already used in the latest version of the code and it helps the robustness. Another implementation also adopts similar parameter. It looks like a reasonable configuration here, however the original paper rarely mentioned the specific value of such parameters.

rtz19970824 · 2017-12-04T02:33:42Z

@yhyu13
I change the e-step into
p_c_h = -0.5 * tf.log(2 * math.pi * sigma_square) -tf.square(votes - miu) / (2 * sigma_square)
p_c = tf.reduce_sum(p_c_h, axis=3)
a1 = tf.log(tf.reshape(activation1, shape=[batch_size, 1, caps_num_c]))
ap = p_c + a1
sum_ap = tf.reduce_logsumexp(ap, axis=2, keep_dims=True)
r = tf.exp(ap - sum_ap)

www0wwwjs1 · 2017-12-04T05:48:55Z

Thanks for the suggestion of stability, bigger number of iterations is supported now.

rtz19970824 · 2017-12-05T02:31:39Z

I'm still a little confused. Is this necessary:
log_p_c_h = log_p_c_h -
(tf.reduce_max(log_p_c_h, axis=[2, 3], keep_dims=True) - tf.log(10.0))
@www0wwwjs1 @yhyu13

yhyu13 · 2017-12-05T05:04:29Z

@rtz19970824

Yes. If you comment that line, it gives NaN gradient & loss during training immediately on both dataset. Let me know if this happens on your machine too.

Also, we are still using Yunzhi Shi's contribution because I find the network learns a bit faster in his setting. However, your contribution in this discussion is valuable and is much appreciated.

Yunzhi Shi:

Yours:

www0wwwjs1 · 2017-12-05T06:01:03Z

@rtz19970824

The purpose of this line is only to help the numerical stability, as shown in @yhyu13's comment. It does not correspond to any part of the algorithm in the original paper.

rtz19970824 · 2017-12-05T08:30:21Z

Thanks.

It seems that if the results on mnist don't perform like your report, maybe there should be a check?@www0wwwjs1

www0wwwjs1 · 2017-12-05T12:52:13Z

The experiment on MNIST was casted with an old version of the code. After the experiment, we added more things to further improve the performance especially for the more challenging smallNORB dataset. These auxiliary parts may impact the performance on the MNIST as well, however I think it will be positive influence. I also plan to recast the experiments on MNIST after we finished current experiments on smallNORB. If negative influence is observed, please let us know, many thanks.

rtz19970824 · 2017-12-05T12:56:31Z

I clone the latest version and run it without change any hyper-parameters. The loss is decreasing. However, the test accuracy cannot improve. I also try to set the "is_train" flag false, the test accuracy don't improve as well. I just run the command "python3 train.py "mnist"" and "python3 eval.py "mnist"". If there's anything wrong on my experiment settings, please let me know. Thanks!

www0wwwjs1 · 2017-12-05T13:09:49Z

Sorry, that's my bad. I pushed some experimental configurations without solid validation. You can clone the project again. The latest version of configuration should be valid. The same hyper-parameters are also used in our newest experiments. Although the final results is still incoming, the test accuracy is already growing up.

rtz19970824 · 2017-12-06T11:53:02Z

Thanks! it can perform well know.

ashleygritzman · 2018-06-07T11:10:42Z

Did anyone manage to get this to work with 3 or more iterations?

Sirius083 · 2018-11-07T08:39:07Z

In em-routing's m-step, why not updating mean in the first two iterations? capsule_em.py L352? (the else part does not clear), can you give some explaination here, thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a problem on routing iteration #10

a problem on routing iteration #10

rtz19970824 commented Dec 1, 2017 •

edited

Loading

rtz19970824 commented Dec 2, 2017

www0wwwjs1 commented Dec 2, 2017

yhyu13 commented Dec 3, 2017

rtz19970824 commented Dec 3, 2017 •

edited

Loading

yhyu13 commented Dec 3, 2017

www0wwwjs1 commented Dec 3, 2017

rtz19970824 commented Dec 4, 2017

www0wwwjs1 commented Dec 4, 2017

rtz19970824 commented Dec 5, 2017

yhyu13 commented Dec 5, 2017 •

edited

Loading

www0wwwjs1 commented Dec 5, 2017

rtz19970824 commented Dec 5, 2017

www0wwwjs1 commented Dec 5, 2017

rtz19970824 commented Dec 5, 2017

www0wwwjs1 commented Dec 5, 2017

rtz19970824 commented Dec 6, 2017

ashleygritzman commented Jun 7, 2018

Sirius083 commented Nov 7, 2018

a problem on routing iteration #10

a problem on routing iteration #10

Comments

rtz19970824 commented Dec 1, 2017 • edited Loading

rtz19970824 commented Dec 2, 2017

www0wwwjs1 commented Dec 2, 2017

yhyu13 commented Dec 3, 2017

rtz19970824 commented Dec 3, 2017 • edited Loading

yhyu13 commented Dec 3, 2017

www0wwwjs1 commented Dec 3, 2017

rtz19970824 commented Dec 4, 2017

www0wwwjs1 commented Dec 4, 2017

rtz19970824 commented Dec 5, 2017

yhyu13 commented Dec 5, 2017 • edited Loading

www0wwwjs1 commented Dec 5, 2017

rtz19970824 commented Dec 5, 2017

www0wwwjs1 commented Dec 5, 2017

rtz19970824 commented Dec 5, 2017

www0wwwjs1 commented Dec 5, 2017

rtz19970824 commented Dec 6, 2017

ashleygritzman commented Jun 7, 2018

Sirius083 commented Nov 7, 2018

rtz19970824 commented Dec 1, 2017 •

edited

Loading

rtz19970824 commented Dec 3, 2017 •

edited

Loading

yhyu13 commented Dec 5, 2017 •

edited

Loading