You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The fully connected layers added on top of the capsule network consist of ~1.6m parameters, whereas the capsules only have roughly 60k trainable parameters in the small configuration. As the matrix capsules are supposed to generalize better with fewer parameters, as compared to traditional architectures, this approach seems counterintuitive to me.
However, removing the reconstruction loss and training with spread loss alone doesn't appear to converge (on smallNORB). Where you able to train your network with spread loss only (as suggested by the paper)?
The text was updated successfully, but these errors were encountered:
When I trained with spread loss only using 2 routing iterations, the training did converge, but the model only seemed to start learning after 6k iterations.
No matter what I have tried so far, I can't seem to get the model to work with 3 or more routing iterations.
Interesting, what were your results like? The paper reports a test error rate of 2.2% on the smallNORB dataset, even with just two routing iterations (albeit 32 capsule types per layer).
The fully connected layers added on top of the capsule network consist of ~1.6m parameters, whereas the capsules only have roughly 60k trainable parameters in the small configuration. As the matrix capsules are supposed to generalize better with fewer parameters, as compared to traditional architectures, this approach seems counterintuitive to me.
However, removing the reconstruction loss and training with spread loss alone doesn't appear to converge (on smallNORB). Where you able to train your network with spread loss only (as suggested by the paper)?
The text was updated successfully, but these errors were encountered: