Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Match conv2 weights stddev with cuda-convnet's layer def. #2374
conv2 initW is 0.01 in cuda-convnet.
I'm trying exercise.
I change layer type from full connected to locally connected(just convolutional..).
Is this intended?
conv2's parameter stddev is different from original cuda-convnet.
Currently parameter(1e-4) cause of vanishing-gradient at trying EXERCISE.
The original cuda-convnet's conv2 initW is
And cifar10.py's conv2 stddev is
I think this may be a typo, because
TensorFlow document is questioning
I'm trying this exercise(unfortunately I cannot find model answer on web).
Here's my current
If weight stddev was
And tensorboard shows vanishing-gradient at conv2 layer.
@keiji your solution doesn't exactly replicate Alex's model. You use convolutional layers for local3 and local4, whereas Alex used locally-connected layers that do not share their parameters across patches. My guess is that's possibly one reason why you see such a strong effect of initialization.
@vincentvanhoucke: To make sure I understand: you mean a convolutional layer but using separate filters for each output point? You could do it with batch matmul and a bunch of reshaping / tiling logic, but it would be quite slow. I think we'd need a custom op to it fast. And actually even the reshaping / tiling logic may be out of reach in a performant way.