Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement TD learning #353

Closed
taylorhansen opened this issue Jan 26, 2023 · 0 comments
Closed

Implement TD learning #353

taylorhansen opened this issue Jan 26, 2023 · 0 comments
Assignees
Labels
enhancement Something should be changed training Has to do with the training script

Comments

@taylorhansen
Copy link
Owner

The current Monte Carlo approach to reward target calcs seems to have way too much variance due to the random nature in both the game itself and its starting points, as well as the 2-player aspect. Rewrite the training algorithm to use temporal difference (TD) learning instead, which trains on rewards/next-states directly using a target network to improve stability.

@taylorhansen taylorhansen added enhancement Something should be changed training Has to do with the training script labels Jan 26, 2023
@taylorhansen taylorhansen self-assigned this Jan 26, 2023
taylorhansen added a commit that referenced this issue Jan 28, 2023
Improvement on #353 and setup for #354.

Rewrite training algorithm (again) to remove the concept of episodes and
instead focus on pure learning steps according to the DQN algorithm.
Also add a proper replay buffer implementation.
taylorhansen added a commit that referenced this issue Jan 29, 2023
Improvement on #353 and setup for #354.

Rewrite training algorithm (again) to remove the concept of episodes and
instead focus on pure learning steps according to the DQN algorithm.
Also add a proper replay buffer implementation.
taylorhansen added a commit that referenced this issue Jan 29, 2023
Improvement on #353 and setup for #354.

Rewrite training algorithm (again) to remove the concept of episodes and
instead focus on pure learning steps according to the DQN algorithm.
Also add a proper replay buffer implementation.

Add/rewrite some configs/metrics code to mesh with above.

Also reorganize source tree, general housekeeping.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Something should be changed training Has to do with the training script
Projects
None yet
Development

No branches or pull requests

1 participant