Join GitHub today
Update tf.contrib.layers.batch_norm() docs #4361
Tensorflow version that I use : 0.10 (pip package)
I took heavy use of tf.contrib.layers.batch_norm() the last weeks.
After facing some problems on how to use it correctly, I figured out that there are many devs out there who are confused as well, such as here:
I would suggest to do following improvements to make it more clear:
1) Update example in doc-string:
The example tells in case we use update_collections on its defaults, we have to include this:
But this is actually not working or deprecated, as it throws errors. Instead, we have to do some tiny changes. I would suggest to update the docs as follows:
As a side question, why do we apply it to the total_loss, and not to the train_op directly, as described in the doc-string text. Added a dependency to total_loss works, but grouping it with the train_op would make the example more clear in my opinion, because we do batch-statistic updates only during training.
2) UPDATE_OPS in combination with reuse varscope:
This is related to the question above. Let's say we have a model with which reuses an convolutional encoder (and also its batch-norm-layers) several times. Even when we reuse these layers, the update operation for the batch-statistics is added to UPDATE_OPS nevertheless. Personally, I'm not sure if this is a bug, or if this is really what should be done?
To sum this up: Am I wrong that lines 213-215 should not be executed when reuse=True? So changing it to:
In my case, I'm using a Conv-LSTM-Conv_tp architecture, where I reuse the Conv/Conv_tp for each timestep. When I increase the number of timesteps in the LSTM, the number of update-ops increases in proportionally, while the number of model-parameters stays constant because they get reused. Currently, I'm getting 420 update-ops when calling
3) Handling of is_training parameter:
I have seen a lot of examples people doing something like this in their code to handle the is_training parameter:
As far as I know, this was really required in the past, because is_training was just a Boolean. But since the param can be a Bool-Tensor as well, this is not required anymore. Since many devs are still ding this workaound, added a comment to the doc-string that this is not required anymore could be helpful.
4) Usage on Multi-GPU configuration
a) When I optimize my code for multi-GPU systems (as in the CIFAR10 example) the number of update-ops increases with the factor of num_gpus (might be related to 2) ).
b) When I use tf.contrib.batch_norm() within a multi-GPU system, I get an error like this:
Hence, to we have to wrap evey batch_norm() call with tf.device("/cpu:0")? I guess this might have bad impacts on performance, right?
PS: Sorry in case this question would fits better to StackOverflow. As it is a combination of suggested improvements and questions. Just let me know...
Is reuse=True working? Whenever I'm trying 'reuse=True' I get errors like - "Variable norm0/beta does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?" I'm following the docstring and providing the 'scope' too. As far as I understand, when a variable is to be created using tf.get_variable() and reused, first, it has to be created and then its reuse is to be enabled by using - tf.get_variable_scope().reuse_variables().
Please inform me if this is not the right place to raise this issue. I got to it from #1122
For (2), I agree with @bsautermeister because as I believe adding dependences on
For (3), do we need to share the BN parameters for
def batch_norm_layer(x, train_phase, scope_bn): bn_train = batch_norm(x, decay=0.999, center=True, scale=True, - updates_collections=None, is_training=True) + updates_collections=None, is_training=True, scope=scope_bn) bn_inference = batch_norm(x, decay=0.999, center=True, scale=True, - updates_collections=None, is_training=False) + updates_collections=None, is_training=False, scope=scope_bn, reuse=True) bn = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference) return bn
NOTE: I simply ignored the invalid moving average/variance update in the code for simplicity.
@dasabir and @jfsantos I had same issue. But by speficying the scope_name for batch_norm, the issue was fixed. Under a scope with reusable=True,
I noticed that the docs haven't been updated yet. Would it be useful if the docs instead said:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): train_step = tf.train.GradientDescentOptimizer(0.01).minimize(total_loss)
As for proper reuse across multiple data streams, it looks like a shareable version is still in the works.
As an aside, to the best of my understanding, the notion of a shareable BN layer should be treated with some care. Depending on the use-case, I think there should be an option to distinguish sharing of the moving averages from the sharing of the beta/gamma parameters as noted here.
I solve the problem of reusing batch_normalization by specifying reuse=False when first creating bn(I use slim, but it's same to tf.layers.batch_normalization):
You have to specify reuse=False at your first time to create parameters in batch normalization. Or you will get the error info:
I obey @wjiangcmu 's advice, it works.
// add update_ops before second ruse, and filter out unrelated update_ops(unrelated moving mean and variance)
// second use:
// weight update and dependent extra_ops(moving mean and variance)
in addition, in order to update each batch_norm only once, according to @bsautermeister 's "UPDATE_OPS in combination with reuse varscope", I add the update_ops before the second use each batch_norm, and filter out unrelated update_ops.
Hope this will be helpful for others.