Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learner.run got stuck #782

Closed
Rejuy opened this issue Oct 10, 2022 · 3 comments
Closed

Learner.run got stuck #782

Rejuy opened this issue Oct 10, 2022 · 3 comments

Comments

@Rejuy
Copy link

Rejuy commented Oct 10, 2022

Dear authors:
Thanks for designing tf_agents! I've encountered a problem when I was running the code of google research project called circuit-training. It created an object of Learner (we can call it learner). When it came to learner.run, it called learner._train. I added some log for help. In the outcome, I found that all log was printed out in learner._train, indicating _train was done. However, in learner.run, the log instructions right after calling learner._train was never printed out, which means that the function was actually not returned (the whole training process got stuck). How could this happen? I got no idea. Could you give me some advice? Thanks a lot!!!!

  def run(self, iterations=1, iterator=None, parallel_iterations=10):
    """ ...
    """
   # do things...
    with self.train_summary_writer.as_default(), \
         common.soft_device_placement(), \
         tf.compat.v2.summary.record_if(_summary_record_if), \
         self.strategy.scope():
      iterator = iterator or self._experience_iterator
      loss_info = self._train(tf.constant(iterations),
                              iterator,
                              parallel_iterations)
      logging.info("return back to run")  # never printed out
      train_step_val = self.train_step.numpy()
      for trigger in self.triggers:
        trigger(train_step_val)

      return loss_info

  @common.function(autograph=True)
  def _train(self, iterations, iterator, parallel_iterations):
    # ...
    logging.info("_train start")  # printed out
    # do things
    logging.info("_train end")  # printed out
    return reduced_loss_info
@misc{CircuitTraining2021,
  title = {{Circuit Training}: An open-source framework for generating chip
  floor plans with distributed deep reinforcement learning.},
  author = {Guadarrama, Sergio and Yue, Summer and Boyd, Toby and Jiang, Joe
  Wenjie and Songhori, Ebrahim and Tam, Terence and Mirhoseini, Azalia},
  howpublished = {\url{https://github.com/google_research/circuit_training}},
  url = "https://github.com/google_research/circuit_training",
  year = 2021,
  note = "[Online; accessed 21-December-2021]"
}
@sguada
Copy link
Member

sguada commented Oct 10, 2022

Can you make sure that the actors are generating the data that the learner needs?

For instance can you get data by doing

next(learner._experience_iterator)

@Rejuy
Copy link
Author

Rejuy commented Oct 19, 2022

I solved this problem by changing some parameters in the program. Thx a lot!

Can you make sure that the actors are generating the data that the learner needs?

For instance can you get data by doing

next(learner._experience_iterator)

@Rejuy Rejuy closed this as completed Oct 19, 2022
@kikushah
Copy link

kikushah commented Jan 7, 2024

@Rejuy - I am facing the same issue, can you share how you solved this issue?

@sguada - next(learner._experience_iterator) is generating the data. At the end of the train function - return reduced_loss_info never gets returned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants