Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About multi-gpu training #2

Open
d-li14 opened this issue Jun 21, 2020 · 5 comments
Open

About multi-gpu training #2

d-li14 opened this issue Jun 21, 2020 · 5 comments

Comments

@d-li14
Copy link

d-li14 commented Jun 21, 2020

Thanks for your awesome work! Is there any idea how multi-gpu training is supported? Because you know training ResNet-101 on ImageNet with a single GPU is unacceptably slow.

@thomasverelst
Copy link
Owner

thomasverelst commented Jun 21, 2020

Hi, thanks for having a look at the code. I did not test dual-gpu training, and RN101 indeed takes quite some time on single GPU (~2 weeks). I did not do the effort of implementing multi-gpu support, since I had to use the other available GPUs in our lab for other runs/experiments.
I suspect some changes are needed in the loss. I was planning to look at it anyway in the coming weeks, I'll let you know!

I also plan to release a trained mobilenetv2 with the optimized CUDA code integrated.

@d-li14
Copy link
Author

d-li14 commented Jun 22, 2020

Hi, @thomasverelst
Thanks for your prompt reply and sharing! I have realized your concern about the computational resource, but two weeks is still a fairly long experimental period :).

Furthermore, I have made attempts towards multi-gpu training by simply wrapping the model with torch.nn.DataParallel, but was stucked in some issues:

  • gather the output dict meta across GPUs (possibly I have solved this)
  • the weights of self-constructed tensors here probably cannot be replicated to other GPUs from GPU 0

Looking forward to your good news! Also congratulations on the upcoming MobileNetV2 CUDA code!

@thomasverelst
Copy link
Owner

I've pushed a new branch multigpu. I didn't test training accuracy yet, but it runs. I only had problems with gathering the output dict meta. I considered subclassing DataParallel to support meta but decided to just change the internal working so PyTorch wouldn't complain.
Note that the pretrained checkpoints are different from the master branch (url in README).

@d-li14
Copy link
Author

d-li14 commented Jun 23, 2020

Yeah, it seems to work now. I have successfully run this branch with ResNet-32 on CIFAR for fast prototyping (with matched accuracy and reduced FLOPs). As an additional note, the "FLOPs counting to zero" problem can be solved by modifying the following line
https://github.com/thomasverelst/dynconv/blob/multigpu/classification/main_cifar.py#L204
model = flopscounter.add_flops_counting_methods(model)
to
model = flopscounter.add_flops_counting_methods(model.module), due to the DataParallel wrapping.

@thomasverelst
Copy link
Owner

Thanks a lot, that fixed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants