Multi-GPUs Runtime Error #3

RyanG41 · 2023-10-11T08:26:21Z

Hi,
Thanks for the woderful job.
I encountered a error caused by distributed training, maybe? I ran the code on multi-gpus and got the error below:
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=Truetotorch.nn.parallel.DistributedDataParallel,.....
In the train.py I see the code for multi processing, but here I dont know how to fix it, or can I force the code to run on only 1 gpu?
Thanks for the help of any kind you provide.

The text was updated successfully, but these errors were encountered:

eriche2016 · 2023-12-06T02:21:25Z

@RyanG41 same issue here, have u solved this issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPUs Runtime Error #3

Multi-GPUs Runtime Error #3

RyanG41 commented Oct 11, 2023

eriche2016 commented Dec 6, 2023 •

edited

Loading

Multi-GPUs Runtime Error #3

Multi-GPUs Runtime Error #3

Comments

RyanG41 commented Oct 11, 2023

eriche2016 commented Dec 6, 2023 • edited Loading

eriche2016 commented Dec 6, 2023 •

edited

Loading