Implement multi-step learning #354

taylorhansen · 2023-01-27T00:25:19Z

Should help with striking a balance between MC and TD(n) learning from #353. Can later also try TD(λ) method.

Would require a lot more tracking on either the game worker or TF worker side, leaning towards the TF worker since it can simplify the current process of generating experience and reduce the passing of buffers back and forth between threads.

Close #353. For now, replace Monte Carlo (MC) method, which takes the total discount reward sum (i.e. returns) as the learning target, with a 1-step temporal difference (TD(1)) method, which processes experiences as they come in and uses its own biased estimate of the value function as the learning target. May add back option for MC learning later in the form of TD(n) support from #354. Add config for target network and double Q learning with target net. Also move experience config out of rollout for better grouping.

Improvement on #353 and setup for #354. Rewrite training algorithm (again) to remove the concept of episodes and instead focus on pure learning steps according to the DQN algorithm. Also add a proper replay buffer implementation.

Improvement on #353 and setup for #354. Rewrite training algorithm (again) to remove the concept of episodes and instead focus on pure learning steps according to the DQN algorithm. Also add a proper replay buffer implementation. Add/rewrite some configs/metrics code to mesh with above. Also reorganize source tree, general housekeeping.

taylorhansen added feature New addition to functionality training Has to do with the training script labels Jan 27, 2023

taylorhansen self-assigned this Jan 27, 2023

taylorhansen closed this as completed in 019ede1 Feb 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement multi-step learning #354

Implement multi-step learning #354

taylorhansen commented Jan 27, 2023 •

edited

Loading

Implement multi-step learning #354

Implement multi-step learning #354

Comments

taylorhansen commented Jan 27, 2023 • edited Loading

taylorhansen commented Jan 27, 2023 •

edited

Loading