New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: batch normalization usage in slim #7469

Closed
soloice opened this Issue Feb 13, 2017 · 12 comments

Comments

Projects
None yet
7 participants
@soloice
Contributor

soloice commented Feb 13, 2017

How to use batch normalization in the testing phase?

I tried to use batch normalization to train a model like this:

bn = lambda x: slim.batch_norm(x, is_training=is_training)
conv = slim.conv2d(images, 64, [3, 3], 1, normalizer_fn=bn, padding='SAME', scope='conv')

But when I finished training and restored my model from checkpoint files, the model's performance on the testing set was poor, just like random guessing.

If these parameters are not dumped as model variables, is it possible to make an example to illustrate how to use batch normalization in slim, esp. for inference?

@cancan101

This comment has been minimized.

Contributor

cancan101 commented Feb 13, 2017

When training are you using: slim.learning.create_train_op(loss, optimizer) ?

@soloice

This comment has been minimized.

Contributor

soloice commented Feb 14, 2017

Thanks for your reply.
I used train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy) in my code. After I saw your comment, I changed it into slim.learning.create_train_op, but it didn't work either.

This is my current code (I've figured out the canonical way to use a normalizer is to config its parameters in normalizer_params, so I replaced the lambda expression with normalizer_params). The most relevant part might be those in function model():

def model():
    # Create the model
    x = tf.placeholder(tf.float32, [None, 784])
    keep_prob = tf.placeholder(tf.float32, [])
    y_ = tf.placeholder(tf.float32, [None, 10])
    is_training = tf.placeholder(tf.bool, [])

    x_image = tf.reshape(x, [-1, 28, 28, 1])
    with slim.arg_scope([slim.conv2d, slim.fully_connected],
                        normalizer_fn=slim.batch_norm,
                        normalizer_params={'is_training': is_training}):
        conv1 = slim.conv2d(x_image, 32, [5, 5], scope='conv1')
        pool1 = slim.max_pool2d(conv1, [2, 2], scope='pool1')
        conv2 = slim.conv2d(pool1, 64, [5, 5], scope='conv2')
        pool2 = slim.max_pool2d(conv2, [2, 2], scope='pool2')
        flatten = slim.flatten(pool2)
        fc = slim.fully_connected(flatten, 1024, scope='fc1')
        drop = slim.dropout(fc, keep_prob=keep_prob)
        logits = slim.fully_connected(drop, 10, activation_fn=None, scope='logits')

    correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    cross_entropy = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=logits))

    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    if update_ops:
        updates = tf.group(*update_ops)
        cross_entropy = control_flow_ops.with_dependencies([updates], cross_entropy)

    # train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
    train_op = slim.learning.create_train_op(cross_entropy, optimizer)

    return {'x': x,
            'y_': y_,
            'keep_prob': keep_prob,
            'is_training': is_training,
            'train_step': train_op,
            'accuracy': accuracy,
            'cross_entropy': cross_entropy}

If FLAGS.phase is "train", the model is trained on the training set. But when I try to evaluate the model on the validation set and pass a False to placeholder is_training, the performance on validation set looks weird (it should be about 98% or higher).

If FLAGS.phase is "test", the model restores a pre-trained model from checkpoint and evaluate on the test set. Again, performance is really poor.

Am I using passing parameters to 'slim.batch_norm' incorrectly? How to use batch normalization for inference?

@aselle aselle added the type:support label Feb 15, 2017

@aselle

This comment has been minimized.

Member

aselle commented Feb 15, 2017

This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!

@aselle aselle closed this Feb 15, 2017

@soloice

This comment has been minimized.

Contributor

soloice commented Feb 15, 2017

Yes, the problem I met is more like a StackOverflow type question.
Though, I'm wondering if the usage of slim.batch_norm layer could be added into slim readme.md or somewhere else to make it clearer.

@soloice

This comment has been minimized.

Contributor

soloice commented Feb 16, 2017

It's my bad. I haven't fully understood the dynamics of batch normalization layer during training.

This is what happened.

I feel very sorry for bothering you with my carelessness.

@aselle

This comment has been minimized.

Member

aselle commented Feb 16, 2017

Don't worry abou tit @soloice, glad you got things working.

@jacky841102

This comment has been minimized.

jacky841102 commented Apr 20, 2017

Hi @soloice

I also met similar problem as yours while using slim.batch_norm, and solved by following

Another important thing is, be sure to use slim.learning.create_train_op to create train op. Do not use tf native tf.train.GradientDescentOptimizer(0.1).minimize(loss).

Thanks a lot! However, do you know the reason why we need to use slim.create_train_op instead of native tf.train.GradientDescentOptimizer(0.1).minimize(loss)?

@soloice

This comment has been minimized.

Contributor

soloice commented Apr 21, 2017

@jacky841102 I don't know yet. I haven't checked source code in module slim.

@balansky

This comment has been minimized.

balansky commented Sep 29, 2017

@soloice
from slim.create_train_op source code, you can see it calls update_ops for you. So, I think you don't really need to call update_op manually by using slim.create_train_op

Update ops use GraphKeys.UPDATE_OPS collection if update_ops is None.

global_update_ops = set(ops.get_collection(ops.GraphKeys.UPDATE_OPS))
if update_ops is None:
update_ops = global_update_ops
else:
update_ops = set(update_ops)
if not global_update_ops.issubset(update_ops):
logging.warning('update_ops in create_train_op does not contain all the '
' update_ops in GraphKeys.UPDATE_OPS')

Make sure update_ops are computed before total_loss.

if update_ops:
with ops.control_dependencies(update_ops):
barrier = control_flow_ops.no_op(name='update_barrier')
total_loss = control_flow_ops.with_dependencies([barrier], total_loss)

@tasx0823

This comment has been minimized.

tasx0823 commented Dec 7, 2017

When you use slim.batch_norm,be sure to use "slim.learning.create_train_op" instead of "tf.train.GradientDecentOptimizer(lr).minimize(loss)" or other optimizer. Try it to see if it works!

@studentSam0000

This comment has been minimized.

studentSam0000 commented Jan 9, 2018

@soloice @cancan101 @tasx0823 Could you please explain why using train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy) with a model defined with tf.slim.batch_norm is wrong? Thank you for pointing this out.

Contrastingly, I am using the native train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy) method and seems to work fine(i am finetuning resnet50).

I defined my training and the validation outputs as following,

with slim.arg_scope(self.network_arg_scope()):

        self.network_logits,_     = resnet_v1_50(inputs=inputs) 

        self.network_logits_val,_ = resnet_v1_50(inputs=inputs, reuse=True, is_training=False) ## Define a new output for testing but reuse the same model variables`

One thing I can't explain is that I get two blocks in the TensorBoard as resnet_v1_50 and resnet_v1_50_1. However, now I fear there is something wrong with this as for your comments. Have you any idea of what is happening here?

See the TensorBoard Graph

Thank you in advance :)

@studentSam0000

This comment has been minimized.

studentSam0000 commented Jan 10, 2018

@soloice @cancan101 @tasx0823 .... Thank you for the lead. Using create_train_op is giving something very close to the validation performance I was expecting. :)

Btw this is also explained in the documentation :

By default, slim.learning.create_train_op includes all update ops that are
part of the tf.GraphKeys.UPDATE_OPS collection. Additionally, TF-Slim's
slim.batch_norm function adds the moving mean and moving variance updates to
this collection. Consequently, users who want to use slim.batch_norm will not
need to take any additional steps in order to have the moving mean and moving
variance updates be computed.

However, I still have two blocks as resnet_v1_50 and resnet_v1_50_1 in my TensorBoard visualization. Is this due to the distinct operators in the BatchNorm layer at train and validation times?
updated graph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment