Use multiple threads for inference during training #359

taylorhansen · 2023-03-17T00:13:16Z

During training the game threads spend most of their time idling waiting for an inference from the TF/model worker thread, forcing it to divide its time between learning and inference which creates a bottleneck that can slow down training. A more distributed asynchronous approach to serving inferences should help with utilization.

Make each game worker keep its own copy of the model to use for inference during evaluation games (and maybe also rollout), which is periodically synced when appropriate. They should be using the CPU version of TensorFlow due to the batch and model size being small enough on each instance that it shouldn't cause too much of a performance loss, or possibly a gain if this takes care of the current bottleneck.

If the rollout workers are also doing this they should also be sending experience data to the learner manually but with more preprocessing for things like n-step returns for better alignment with this idea of offloading some of the work from the learner.

The same could likely be done for the model comparison and/or psbot scripts, but some profiling might be needed first.

taylorhansen added enhancement Something should be changed training Has to do with the training script labels Mar 17, 2023

taylorhansen self-assigned this Mar 17, 2023

taylorhansen changed the title ~~Use CPU for inference during training~~ Use multiple threads for inference during training Mar 24, 2023

taylorhansen mentioned this issue Mar 24, 2023

Allow multiple training games per thread #360

Closed

taylorhansen closed this as completed in 49ac70a Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use multiple threads for inference during training #359

Use multiple threads for inference during training #359

taylorhansen commented Mar 17, 2023 •

edited

Loading

Use multiple threads for inference during training #359

Use multiple threads for inference during training #359

Comments

taylorhansen commented Mar 17, 2023 • edited Loading

taylorhansen commented Mar 17, 2023 •

edited

Loading