Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Update tf.contrib.layers.batch_norm() docs #4361
Tensorflow version that I use : 0.10 (pip package)
I took heavy use of tf.contrib.layers.batch_norm() the last weeks.
After facing some problems on how to use it correctly, I figured out that there are many devs out there who are confused as well, such as here:
I would suggest to do following improvements to make it more clear:
1) Update example in doc-string:
The example tells in case we use update_collections on its defaults, we have to include this:
But this is actually not working or deprecated, as it throws errors. Instead, we have to do some tiny changes. I would suggest to update the docs as follows:
As a side question, why do we apply it to the total_loss, and not to the train_op directly, as described in the doc-string text. Added a dependency to total_loss works, but grouping it with the train_op would make the example more clear in my opinion, because we do batch-statistic updates only during training.
2) UPDATE_OPS in combination with reuse varscope:
This is related to the question above. Let's say we have a model with which reuses an convolutional encoder (and also its batch-norm-layers) several times. Even when we reuse these layers, the update operation for the batch-statistics is added to UPDATE_OPS nevertheless. Personally, I'm not sure if this is a bug, or if this is really what should be done?
To sum this up: Am I wrong that lines 213-215 should not be executed when reuse=True? So changing it to:
In my case, I'm using a Conv-LSTM-Conv_tp architecture, where I reuse the Conv/Conv_tp for each timestep. When I increase the number of timesteps in the LSTM, the number of update-ops increases in proportionally, while the number of model-parameters stays constant because they get reused. Currently, I'm getting 420 update-ops when calling
3) Handling of is_training parameter:
I have seen a lot of examples people doing something like this in their code to handle the is_training parameter:
As far as I know, this was really required in the past, because is_training was just a Boolean. But since the param can be a Bool-Tensor as well, this is not required anymore. Since many devs are still ding this workaound, added a comment to the doc-string that this is not required anymore could be helpful.
4) Usage on Multi-GPU configuration
a) When I optimize my code for multi-GPU systems (as in the CIFAR10 example) the number of update-ops increases with the factor of num_gpus (might be related to 2) ).
b) When I use tf.contrib.batch_norm() within a multi-GPU system, I get an error like this:
Hence, to we have to wrap evey batch_norm() call with tf.device("/cpu:0")? I guess this might have bad impacts on performance, right?
PS: Sorry in case this question would fits better to StackOverflow. As it is a combination of suggested improvements and questions. Just let me know...
Is reuse=True working? Whenever I'm trying 'reuse=True' I get errors like - "Variable norm0/beta does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?" I'm following the docstring and providing the 'scope' too. As far as I understand, when a variable is to be created using tf.get_variable() and reused, first, it has to be created and then its reuse is to be enabled by using - tf.get_variable_scope().reuse_variables().
Please inform me if this is not the right place to raise this issue. I got to it from #1122
For (2), I agree with @bsautermeister because as I believe adding dependences on
For (3), do we need to share the BN parameters for
def batch_norm_layer(x, train_phase, scope_bn): bn_train = batch_norm(x, decay=0.999, center=True, scale=True, - updates_collections=None, is_training=True) + updates_collections=None, is_training=True, scope=scope_bn) bn_inference = batch_norm(x, decay=0.999, center=True, scale=True, - updates_collections=None, is_training=False) + updates_collections=None, is_training=False, scope=scope_bn, reuse=True) bn = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference) return bn
NOTE: I simply ignored the invalid moving average/variance update in the code for simplicity.
@dasabir and @jfsantos I had same issue. But by speficying the scope_name for batch_norm, the issue was fixed. Under a scope with reusable=True,
I noticed that the docs haven't been updated yet. Would it be useful if the docs instead said:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): train_step = tf.train.GradientDescentOptimizer(0.01).minimize(total_loss)
As for proper reuse across multiple data streams, it looks like a shareable version is still in the works.
As an aside, to the best of my understanding, the notion of a shareable BN layer should be treated with some care. Depending on the use-case, I think there should be an option to distinguish sharing of the moving averages from the sharing of the beta/gamma parameters as noted here.
I solve the problem of reusing batch_normalization by specifying reuse=False when first creating bn(I use slim, but it's same to tf.layers.batch_normalization):
You have to specify reuse=False at your first time to create parameters in batch normalization. Or you will get the error info:
I obey @wjiangcmu 's advice, it works.
// add update_ops before second ruse, and filter out unrelated update_ops(unrelated moving mean and variance)
// second use:
// weight update and dependent extra_ops(moving mean and variance)
in addition, in order to update each batch_norm only once, according to @bsautermeister 's "UPDATE_OPS in combination with reuse varscope", I add the update_ops before the second use each batch_norm, and filter out unrelated update_ops.
Hope this will be helpful for others.