Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple gpu when running 'sjSDM' #63

Closed
chnpenny opened this issue Feb 4, 2021 · 1 comment
Closed

multiple gpu when running 'sjSDM' #63

chnpenny opened this issue Feb 4, 2021 · 1 comment

Comments

@chnpenny
Copy link
Collaborator

chnpenny commented Feb 4, 2021

Hi,
We figured out that there's no argument 'n_gpu' in the function 'sjSDM', but only in 'sjSDM_cv'. Is it possible to use multiple gpus to run 'sjSDM' function at all? If so, is it implemented yet in 'sjSDM' function?
Thanks a lot!

@MaximilianPi
Copy link
Member

Hi,
no, it is not supported and I'm not sure if it is worth the time. Let me explain, we have two different scenarios here:
a) sjSDM_cv is used to train up to hundreds of models. Distributing the workload to several GPUs is very favorable here, particularly for small models / datasets as you can run several small models at the same time on the same GPU, an example: 3 gpus, 21 tuning steps, 10x CV --> we have to fit 630 models. If they are small, we could use the sjSDM_cv function to run 21 CPU slaves to train 21 models simultaneously (7 models on each GPU).
b) sjSDM fits only one model. It is indeed possible to use more than one GPU to train the model (see https://www.tensorflow.org/tutorials/distribute/keras, or https://pytorch.org/tutorials/beginner/dist_overview.html). Here, the training of the model itself is distributed but there is an overhead and it is usually only used with very big data and models. As long as you can fit one model within minutes I don't think it is necessary to use distributed learning.

Cheers,
Max

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants