Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we check for termination in collect step in the dqn_tutorial colab? #47

Closed
siavash-khodadadeh opened this issue Mar 21, 2019 · 1 comment
Assignees

Comments

@siavash-khodadadeh
Copy link

I am looking into this colab document.

I was wondering if we should check for termination In Data Collection section:

def collect_step(environment, policy):
  time_step = environment.current_time_step()
  action_step = policy.action(time_step)
  next_time_step = environment.step(action_step.action)
  traj = trajectory.from_transition(time_step, action_step, next_time_step)

  # Add trajectory to the replay buffer
  replay_buffer.add_batch(traj)


for _ in range(initial_collect_steps):
  collect_step(train_env, random_policy)

I guess we need to check if the environment is terminated here. Something like this:

def collect_step(environment, policy):
  time_step = environment.current_time_step()
  action_step = policy.action(time_step)
  next_time_step = environment.step(action_step.action)
  traj = trajectory.from_transition(time_step, action_step, next_time_step)
  if next_time_step.is_last():
    environment.reset()

  # Add trajectory to the replay buffer
  replay_buffer.add_batch(traj)


for _ in range(initial_collect_steps):
  collect_step(train_env, random_policy)
@kbanoop
Copy link
Contributor

kbanoop commented Mar 21, 2019

Our environments are auto resetting. So at the end of en episode, if you call step again, it will ignore the action and perform a reset instead.

@kbanoop kbanoop closed this as completed Mar 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants