-
Notifications
You must be signed in to change notification settings - Fork 74.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update tf.contrib.layers.batch_norm() docs #4361
Comments
Agree, I believe there is bug in batch_norm. |
With bug in batch_norm, which point's of my list do you actually mean? And could you propose any workaround? |
Dont know why, I cannot do multi-gpu training when batch_norm moving_avg is applied, but when I update my tf to master version and update my cuda,cudnn, the problem go away. |
@shlens Could you take a look at this? Thanks. |
@bsautermeister would you have a suggested edit on the docstring that would make the layer more clear? @argman@, it sounds like your error is fixed, correct? |
@shlens , yes, I just update tf to the newest |
Is reuse=True working? Whenever I'm trying 'reuse=True' I get errors like - "Variable norm0/beta does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?" I'm following the docstring and providing the 'scope' too. As far as I understand, when a variable is to be created using tf.get_variable() and reused, first, it has to be created and then its reuse is to be enabled by using - tf.get_variable_scope().reuse_variables(). Please inform me if this is not the right place to raise this issue. I got to it from #1122 |
I have the same issue as @dasabir when trying to reuse a batch_norm layer within a variable scope. |
For (2), I agree with @bsautermeister because as I believe adding dependences on For (3), do we need to share the BN parameters for def batch_norm_layer(x, train_phase, scope_bn):
bn_train = batch_norm(x, decay=0.999, center=True, scale=True,
- updates_collections=None, is_training=True)
+ updates_collections=None, is_training=True, scope=scope_bn)
bn_inference = batch_norm(x, decay=0.999, center=True, scale=True,
- updates_collections=None, is_training=False)
+ updates_collections=None, is_training=False, scope=scope_bn, reuse=True)
bn = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference)
return bn NOTE: I simply ignored the invalid moving average/variance update in the code for simplicity. |
@dasabir and @jfsantos I had same issue. But by speficying the scope_name for batch_norm, the issue was fixed. Under a scope with reusable=True, |
I noticed that the docs haven't been updated yet. Would it be useful if the docs instead said: update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(total_loss) As for proper reuse across multiple data streams, it looks like a shareable version is still in the works. As an aside, to the best of my understanding, the notion of a shareable BN layer should be treated with some care. Depending on the use-case, I think there should be an option to distinguish sharing of the moving averages from the sharing of the beta/gamma parameters as noted here. |
Is this still a problem with |
Closing due to lack of recent activity. Please update the issue if it persists and we will reopen. |
When you use batch normalization across multi gpus, how to update variance? |
I solve the problem of reusing batch_normalization by specifying reuse=False when first creating bn(I use slim, but it's same to tf.layers.batch_normalization):
You have to specify reuse=False at your first time to create parameters in batch normalization. Or you will get the error info: |
I obey @wjiangcmu 's advice, it works. // add update_ops before second ruse, and filter out unrelated update_ops(unrelated moving mean and variance) // second use: // weight update and dependent extra_ops(moving mean and variance) in addition, in order to update each batch_norm only once, according to @bsautermeister 's "UPDATE_OPS in combination with reuse varscope", I add the update_ops before the second use each batch_norm, and filter out unrelated update_ops. Hope this will be helpful for others. |
Tensorflow version that I use : 0.10 (pip package)
I took heavy use of tf.contrib.layers.batch_norm() the last weeks.
After facing some problems on how to use it correctly, I figured out that there are many devs out there who are confused as well, such as here:
I would suggest to do following improvements to make it more clear:
1) Update example in doc-string:
The example tells in case we use update_collections on its defaults, we have to include this:
But this is actually not working or deprecated, as it throws errors. Instead, we have to do some tiny changes. I would suggest to update the docs as follows:
As a side question, why do we apply it to the total_loss, and not to the train_op directly, as described in the doc-string text. Added a dependency to total_loss works, but grouping it with the train_op would make the example more clear in my opinion, because we do batch-statistic updates only during training.
2) UPDATE_OPS in combination with reuse varscope:
This is related to the question above. Let's say we have a model with which reuses an convolutional encoder (and also its batch-norm-layers) several times. Even when we reuse these layers, the update operation for the batch-statistics is added to UPDATE_OPS nevertheless. Personally, I'm not sure if this is a bug, or if this is really what should be done?
Or is it required to filter the update-ops after collecting them with
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
, so that each one is executed just once?To sum this up: Am I wrong that lines 213-215 should not be executed when reuse=True? So changing it to:
In my case, I'm using a Conv-LSTM-Conv_tp architecture, where I reuse the Conv/Conv_tp for each timestep. When I increase the number of timesteps in the LSTM, the number of update-ops increases in proportionally, while the number of model-parameters stays constant because they get reused. Currently, I'm getting 420 update-ops when calling
tf.get_collection(tf.GraphKeys.UPDATE_OPS)
. As the performance feels super slow when I use batch-norm, I guess this high number of update-ops cannot be right.3) Handling of is_training parameter:
I have seen a lot of examples people doing something like this in their code to handle the is_training parameter:
As far as I know, this was really required in the past, because is_training was just a Boolean. But since the param can be a Bool-Tensor as well, this is not required anymore. Since many devs are still ding this workaound, added a comment to the doc-string that this is not required anymore could be helpful.
4) Usage on Multi-GPU configuration
a) When I optimize my code for multi-GPU systems (as in the CIFAR10 example) the number of update-ops increases with the factor of num_gpus (might be related to 2) ).
b) When I use tf.contrib.batch_norm() within a multi-GPU system, I get an error like this:
Hence, to we have to wrap evey batch_norm() call with tf.device("/cpu:0")? I guess this might have bad impacts on performance, right?
Thanks!
PS: Sorry in case this question would fits better to StackOverflow. As it is a combination of suggested improvements and questions. Just let me know...
The text was updated successfully, but these errors were encountered: