-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail to train with multiple of GPU in DP mode #35
Comments
Have you solved it? |
Sorry I don't. So I run it in DDP mode[Laugh and cry]. It runs well. |
This is a problem that arises because the elements being computed go into the cpu and gpu respectively. |
I added "anchors = anchors.to(device='cuda')" in 141 line in loss.py file and that been work! (06.09.2021)
|
Here is the wrong detail.
Traceback (most recent call last):
File "/home/xxx/hard_disk/xxx/ScaledYOLOv4/train.py", line 438, in
train(hyp, opt, device, tb_writer)
File "/home/xxx/hard_disk/xxx/ScaledYOLOv4/train.py", line 255, in train
loss, loss_items = compute_loss(pred, targets.to(device), model) # scaled by batch_size
File "/home/xxx/hard_disk/xxx/ScaledYOLOv4/utils/general.py", line 446, in compute_loss
tcls, tbox, indices, anchors = build_targets(p, targets, model) # targets
File "/home/xxx/hard_disk/xxx/ScaledYOLOv4/utils/general.py", line 526, in build_targets
r = t[None, :, 4:6] / anchors[:, None] # wh ratio
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Process finished with exit code 1
The text was updated successfully, but these errors were encountered: