Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use multiple threads for inference during training #359

Closed
taylorhansen opened this issue Mar 17, 2023 · 0 comments
Closed

Use multiple threads for inference during training #359

taylorhansen opened this issue Mar 17, 2023 · 0 comments
Assignees
Labels
enhancement Something should be changed training Has to do with the training script

Comments

@taylorhansen
Copy link
Owner

taylorhansen commented Mar 17, 2023

During training the game threads spend most of their time idling waiting for an inference from the TF/model worker thread, forcing it to divide its time between learning and inference which creates a bottleneck that can slow down training. A more distributed asynchronous approach to serving inferences should help with utilization.

Make each game worker keep its own copy of the model to use for inference during evaluation games (and maybe also rollout), which is periodically synced when appropriate. They should be using the CPU version of TensorFlow due to the batch and model size being small enough on each instance that it shouldn't cause too much of a performance loss, or possibly a gain if this takes care of the current bottleneck.

If the rollout workers are also doing this they should also be sending experience data to the learner manually but with more preprocessing for things like n-step returns for better alignment with this idea of offloading some of the work from the learner.

The same could likely be done for the model comparison and/or psbot scripts, but some profiling might be needed first.

@taylorhansen taylorhansen added enhancement Something should be changed training Has to do with the training script labels Mar 17, 2023
@taylorhansen taylorhansen self-assigned this Mar 17, 2023
@taylorhansen taylorhansen changed the title Use CPU for inference during training Use multiple threads for inference during training Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Something should be changed training Has to do with the training script
Projects
None yet
Development

No branches or pull requests

1 participant