New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect behavior from tf.layers.batch_normalization()
when training=0
#10118
Comments
The document of
But you didn't use it in your code. |
Where did you find that? The API reference says nothing about needing to run extra operations or adding extra operations to The entire description:
|
Sorry I was reading an old version of the doc. The way to run the update may have changed. |
No. I was actually reading a newer version of doc. See here https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/layers/normalization.py#L338 |
Oh wow, surprised that's not on the website... I'll close this then, since it looks like the change to the doc has already been made. |
Just for the sake of completeness, here is the recommended code from the docs:
|
I am using but it is showing error as If I don't include "update_ops" in the sess.run() then it works fine but i need to create a dependency. |
@akssieg 1. You should make Try using: sess.run([train_step, update_ops], feed_dict={...}) (notice the square brackets) |
System information
Describe the problem
I've noticed that
tf.layers.batch_normalization
doesn't seem to give reasonable results whentraining=0
(i.e. use distribution statistics instead of just the batch), especially if you apply BN before activations (e.g. ResNet-like architectures).Using the Gist above, if you try to fit a model to noise with SGD (lr=0.01) using repeated applications of dense matrix multiplication -> batch normalization -> ReLU activations, you get this loss for the same inputs over time: (blue:
training=1
, green:training=0
)Using an Adam (lr=0.001) optimizer instead gets even weirder results:
However, if I use my own implementation of batch norm (included in gist) I get reasonable results, with the loss for each being similar to each other: (Adam has similar behavior)
(Interestingly this doesn't seem to be as much of a problem if you have ReLU before BN, I haven't thought too deeply about why.)
Am I seeing things and just have some misunderstanding about what that function is doing, or is this actually a bug?
The text was updated successfully, but these errors were encountered: