train_and_eval mode in detection models can't resume training #496

artyompal · 2019-09-02T16:25:40Z

Let's say total_steps=1,000,000, num_steps_per_eval=10,000 and we stopped after 100,000. Then after resuming we'll call executor.train with max_steps 10,000. But current step is already 100,000, so it will do nothing:

  elif FLAGS.mode == 'train_and_eval':
    save_config(params, params.model_dir)
    executor.prepare_evaluation()
    num_cycles = int(params.train.total_steps / params.eval.num_steps_per_eval)

    # FIXME: this doesn't work with resuming
    for cycle in range(num_cycles):
      tf.logging.info('Start training cycle %d.' % cycle)
      current_cycle_last_train_step = ((cycle + 1)
                                       * params.eval.num_steps_per_eval)
      executor.train(train_input_fn, current_cycle_last_train_step)
      executor.evaluate(
          eval_input_fn,
          params.eval.eval_samples // params.predict.predict_batch_size)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train_and_eval mode in detection models can't resume training #496

train_and_eval mode in detection models can't resume training #496

artyompal commented Sep 2, 2019

train_and_eval mode in detection models can't resume training #496

train_and_eval mode in detection models can't resume training #496

Comments

artyompal commented Sep 2, 2019