Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-GPUs Runtime Error #3

Open
RyanG41 opened this issue Oct 11, 2023 · 1 comment
Open

Multi-GPUs Runtime Error #3

RyanG41 opened this issue Oct 11, 2023 · 1 comment

Comments

@RyanG41
Copy link

RyanG41 commented Oct 11, 2023

Hi,
Thanks for the woderful job.
I encountered a error caused by distributed training, maybe? I ran the code on multi-gpus and got the error below:
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=Truetotorch.nn.parallel.DistributedDataParallel,.....
In the train.py I see the code for multi processing, but here I dont know how to fix it, or can I force the code to run on only 1 gpu?
Thanks for the help of any kind you provide.

@eriche2016
Copy link

eriche2016 commented Dec 6, 2023

@RyanG41 same issue here, have u solved this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants