Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train_and_eval mode in detection models can't resume training #496

Open
artyompal opened this issue Sep 2, 2019 · 0 comments
Open

train_and_eval mode in detection models can't resume training #496

artyompal opened this issue Sep 2, 2019 · 0 comments

Comments

@artyompal
Copy link

Let's say total_steps=1,000,000, num_steps_per_eval=10,000 and we stopped after 100,000. Then after resuming we'll call executor.train with max_steps 10,000. But current step is already 100,000, so it will do nothing:

  elif FLAGS.mode == 'train_and_eval':
    save_config(params, params.model_dir)
    executor.prepare_evaluation()
    num_cycles = int(params.train.total_steps / params.eval.num_steps_per_eval)

    # FIXME: this doesn't work with resuming
    for cycle in range(num_cycles):
      tf.logging.info('Start training cycle %d.' % cycle)
      current_cycle_last_train_step = ((cycle + 1)
                                       * params.eval.num_steps_per_eval)
      executor.train(train_input_fn, current_cycle_last_train_step)
      executor.evaluate(
          eval_input_fn,
          params.eval.eval_samples // params.predict.predict_batch_size)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant