Skip to content

Commit

Permalink
Switch line orders in trainer so that restore_map is called after mov…
Browse files Browse the repository at this point in the history
…ing average variables are created.

Moving averages are now properly loaded during fine-tuning, instead of being recreated.

PiperOrigin-RevId: 190496046
  • Loading branch information
pkulzc committed Apr 3, 2018
1 parent ed877c0 commit 93b8168
Showing 1 changed file with 23 additions and 23 deletions.
46 changes: 23 additions & 23 deletions research/object_detection/trainer.py
Expand Up @@ -264,29 +264,6 @@ def train(create_tensor_dict_fn, create_model_fn, train_config, master, task,
total_num_replicas=worker_replicas)
sync_optimizer = training_optimizer

# Create ops required to initialize the model from a given checkpoint.
init_fn = None
if train_config.fine_tune_checkpoint:
if not train_config.fine_tune_checkpoint_type:
# train_config.from_detection_checkpoint field is deprecated. For
# backward compatibility, fine_tune_checkpoint_type is set based on
# from_detection_checkpoint.
if train_config.from_detection_checkpoint:
train_config.fine_tune_checkpoint_type = 'detection'
else:
train_config.fine_tune_checkpoint_type = 'classification'
var_map = detection_model.restore_map(
fine_tune_checkpoint_type=train_config.fine_tune_checkpoint_type,
load_all_detection_checkpoint_vars=(
train_config.load_all_detection_checkpoint_vars))
available_var_map = (variables_helper.
get_variables_available_in_checkpoint(
var_map, train_config.fine_tune_checkpoint))
init_saver = tf.train.Saver(available_var_map)
def initializer_fn(sess):
init_saver.restore(sess, train_config.fine_tune_checkpoint)
init_fn = initializer_fn

with tf.device(deploy_config.optimizer_device()):
regularization_losses = (None if train_config.add_regularization_loss
else [])
Expand Down Expand Up @@ -354,6 +331,29 @@ def initializer_fn(sess):
saver = tf.train.Saver(
keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours)

# Create ops required to initialize the model from a given checkpoint.
init_fn = None
if train_config.fine_tune_checkpoint:
if not train_config.fine_tune_checkpoint_type:
# train_config.from_detection_checkpoint field is deprecated. For
# backward compatibility, fine_tune_checkpoint_type is set based on
# from_detection_checkpoint.
if train_config.from_detection_checkpoint:
train_config.fine_tune_checkpoint_type = 'detection'
else:
train_config.fine_tune_checkpoint_type = 'classification'
var_map = detection_model.restore_map(
fine_tune_checkpoint_type=train_config.fine_tune_checkpoint_type,
load_all_detection_checkpoint_vars=(
train_config.load_all_detection_checkpoint_vars))
available_var_map = (variables_helper.
get_variables_available_in_checkpoint(
var_map, train_config.fine_tune_checkpoint))
init_saver = tf.train.Saver(available_var_map)
def initializer_fn(sess):
init_saver.restore(sess, train_config.fine_tune_checkpoint)
init_fn = initializer_fn

slim.learning.train(
train_tensor,
logdir=train_dir,
Expand Down

6 comments on commit 93b8168

@vs-zhehangd
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
I want to point out that this switch changes the naming of the nodes. This breaks the compatibility of some pretrained models. See this:
#3922

@varun19299
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

This has affected all models (including SSDlite_mobilenet and SSD_mobilenetv2) [Most recent models].

Also, could the model zoo include tensorflow version compatibility?

Currently, I believe it is 1.5+

@pkulzc
Copy link
Contributor Author

@pkulzc pkulzc commented on 93b8168 May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this change did not change any naming.

The warning is complaining about missing RMSProp values, which is from optimizer instead of model. It doesn't matter at all because you don't really need to restore optimizer parameters.

This change is actually very helpful.

  • Before this change - restoring happened early so it doesn't know moving average variables are not available in checkpoint, and no warnings are printed.
  • After this change - when restoring parameters trainer found ops like ExponentialMovingAverage were not in checkpoint and then printed warnings.

The training behavior should not be impacted, as the moving average has no impact on the training. But in eval, when moving average is turned on, it should pick up more quickly as the moving average no longer needs to start from 0.

This change makes the system aware of the missing vars, it doesn't change anything. The warning message doesn't affect your training either.

@varun19299 tf version compatibility already exists in the model zoo documents.

@varun19299
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick reply.

Are you sure this hasn't affected the naming? The reason I am asking this is because:

WARNING:root:Variable [SecondStageFeatureExtractor/InceptionResnetV2/Repeat/block8_9/Conv2d_1x1/biases/Momentum]

was a warning that /block8_9/Conv2d_1x1/biases/Momentum was not found in the checkpoint, when in fact (by printing all the graph operations):

/block8_9/Conv2d_1x1/biases/ was found in the checkpoint.

Further, in January (4 commits prior), some users fixed this warning by re-initiating the train-dir directory. I am not sure if these two methods are connected. To note, this fix no longer works as of May. Could you please check if these two were connected?

@pkulzc
Copy link
Contributor Author

@pkulzc pkulzc commented on 93b8168 May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'm sure it doesn't hurt anything. This Momentum var is used by this optimizer. /block8_9/Conv2d_1x1/biases should be in checkpoint because it's part of the net, while optimizer parameters doesn't need to.

Making this warning disappear doesn't mean fixing.

Moreover, we will release new interfaces and this trainer file will be deprecated soon.

@varun19299
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this makes sense.

So the model zoo checkpoints do not include optimis(z)er parameters right?

Please sign in to comment.