Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multi-step learning #354

Closed
taylorhansen opened this issue Jan 27, 2023 · 0 comments
Closed

Implement multi-step learning #354

taylorhansen opened this issue Jan 27, 2023 · 0 comments
Assignees
Labels
feature New addition to functionality training Has to do with the training script

Comments

@taylorhansen
Copy link
Owner

taylorhansen commented Jan 27, 2023

Should help with striking a balance between MC and TD(n) learning from #353. Can later also try TD(λ) method.

Would require a lot more tracking on either the game worker or TF worker side, leaning towards the TF worker since it can simplify the current process of generating experience and reduce the passing of buffers back and forth between threads.

@taylorhansen taylorhansen added feature New addition to functionality training Has to do with the training script labels Jan 27, 2023
@taylorhansen taylorhansen self-assigned this Jan 27, 2023
taylorhansen added a commit that referenced this issue Jan 27, 2023
Close #353.

For now, replace Monte Carlo (MC) method, which takes the total discount
reward sum (i.e. returns) as the learning target, with a 1-step temporal
difference (TD(1)) method, which processes experiences as they come in
and uses its own biased estimate of the value function as the learning
target.

May add back option for MC learning later in the form of TD(n) support
from #354.

Add config for target network and double Q learning with target net.
Also move experience config out of rollout for better grouping.
taylorhansen added a commit that referenced this issue Jan 28, 2023
Improvement on #353 and setup for #354.

Rewrite training algorithm (again) to remove the concept of episodes and
instead focus on pure learning steps according to the DQN algorithm.
Also add a proper replay buffer implementation.
taylorhansen added a commit that referenced this issue Jan 29, 2023
Improvement on #353 and setup for #354.

Rewrite training algorithm (again) to remove the concept of episodes and
instead focus on pure learning steps according to the DQN algorithm.
Also add a proper replay buffer implementation.
taylorhansen added a commit that referenced this issue Jan 29, 2023
Improvement on #353 and setup for #354.

Rewrite training algorithm (again) to remove the concept of episodes and
instead focus on pure learning steps according to the DQN algorithm.
Also add a proper replay buffer implementation.

Add/rewrite some configs/metrics code to mesh with above.

Also reorganize source tree, general housekeeping.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New addition to functionality training Has to do with the training script
Projects
None yet
Development

No branches or pull requests

1 participant