Training without reconstruction loss #34

JeanElsner · 2018-06-05T12:56:38Z

The fully connected layers added on top of the capsule network consist of ~1.6m parameters, whereas the capsules only have roughly 60k trainable parameters in the small configuration. As the matrix capsules are supposed to generalize better with fewer parameters, as compared to traditional architectures, this approach seems counterintuitive to me.
However, removing the reconstruction loss and training with spread loss alone doesn't appear to converge (on smallNORB). Where you able to train your network with spread loss only (as suggested by the paper)?

ashleygritzman · 2018-06-05T13:39:25Z

When I trained with spread loss only using 2 routing iterations, the training did converge, but the model only seemed to start learning after 6k iterations.

No matter what I have tried so far, I can't seem to get the model to work with 3 or more routing iterations.

JeanElsner · 2018-06-05T16:27:22Z

Interesting, what were your results like? The paper reports a test error rate of 2.2% on the smallNORB dataset, even with just two routing iterations (albeit 32 capsule types per layer).

ashleygritzman · 2018-06-06T07:20:31Z

I got more like 8-9% error, I haven't heard of anyone that has successfully reproduced the error reported in the paper yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training without reconstruction loss #34

Training without reconstruction loss #34

JeanElsner commented Jun 5, 2018

ashleygritzman commented Jun 5, 2018

JeanElsner commented Jun 5, 2018

ashleygritzman commented Jun 6, 2018

Training without reconstruction loss #34

Training without reconstruction loss #34

Comments

JeanElsner commented Jun 5, 2018

ashleygritzman commented Jun 5, 2018

JeanElsner commented Jun 5, 2018

ashleygritzman commented Jun 6, 2018