Skip to content
This repository was archived by the owner on Jul 1, 2023. It is now read-only.

Conversation

@jekbradbury
Copy link
Contributor

The correct way to extend the initialization scheme introduced in Glorot and Bengio for dense layers to convolutional layers is to multiply the fanIn and fanOut sizes by the receptive field size (the product of kernel dimensions). Keras, PyTorch, Lasagne etc. all implement this correction without mentioning it in the relevant docstrings. As discussed for a related initialization in He et al., this is needed because the responses produced in a convolutional layer are equivalent to those produced by a pointwise dense layer over a feature space that has been expanded by a factor of the receptive field size.

Fixes the convergence discrepancy seen on CIFAR convnets.

@jekbradbury jekbradbury requested review from rxwei and saeta February 22, 2019 08:27
Co-Authored-By: jekbradbury <jekbradbury@gmail.com>
@jekbradbury jekbradbury merged commit 22c923a into tensorflow:master Feb 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants