You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the YOLOv5 issues and discussions and found no similar questions.
Question
Hello everyone,
unfortunately I cannot select a specific CUDA device.
I run the following:
python train.py --epochs 10 --device 0
I get
train: weights=yolov5s.pt, cfg=, data=data/coco128.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=10, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, evolve_population=data/hyps, resume_evolve=None, bucket=, cache=None, image_weights=False, device=0, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest, ndjson_console=False, ndjson_file=False
github: up to date with https://github.com/ultralytics/yolov5 ✅
Traceback (most recent call last):
File "/raid/USERDATA/pawlodwp/yolo_detectors/yolov5/YOLOv5_test_env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 306, in _lazy_init
queued_call()
File "/raid/USERDATA/pawlodwp/yolo_detectors/yolov5/YOLOv5_test_env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 174, in _check_capability
capability = get_device_capability(d)
File "/raid/USERDATA/pawlodwp/yolo_detectors/yolov5/YOLOv5_test_env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
prop = get_device_properties(device)
File "/raid/USERDATA/pawlodwp/yolo_detectors/yolov5/YOLOv5_test_env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 448, in get_device_properties
return _get_device_properties(device) # type: ignore[name-defined]
RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=, num_gpus=
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/raid/USERDATA/pawlodwp/yolo_detectors/yolov5/train.py", line 848, in <module>
main(opt)
File "/raid/USERDATA/pawlodwp/yolo_detectors/yolov5/train.py", line 607, in main
device = select_device(opt.device, batch_size=opt.batch_size)
File "/raid/USERDATA/pawlodwp/yolo_detectors/yolov5/utils/torch_utils.py", line 134, in select_device
p = torch.cuda.get_device_properties(i)
File "/raid/USERDATA/pawlodwp/yolo_detectors/yolov5/YOLOv5_test_env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 444, in get_device_properties
_lazy_init() # will define _get_device_properties
File "/raid/USERDATA/pawlodwp/yolo_detectors/yolov5/YOLOv5_test_env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 312, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=, num_gpus=
CUDA call was originally invoked at:
File "/raid/USERDATA/pawlodwp/yolo_detectors/yolov5/train.py", line 34, in <module>
import torch
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/raid/USERDATA/pawlodwp/yolo_detectors/yolov5/YOLOv5_test_env/lib/python3.10/site-packages/torch/__init__.py", line 1478, in <module>
_C._initExtension(manager_path())
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/raid/USERDATA/pawlodwp/yolo_detectors/yolov5/YOLOv5_test_env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 238, in <module>
_lazy_call(_check_capability)
File "/raid/USERDATA/pawlodwp/yolo_detectors/yolov5/YOLOv5_test_env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 235, in _lazy_call
_queued_calls.append((callable, traceback.format_stack()))
But if I simply omit -device 0, then it works. However, all CUDA devices are selected.
Hey there! 😊 Thanks for pinpointing that the issue seems tied to PyTorch v2.3. Different versions of PyTorch can have unique behaviors or bugs that affect other software, including YOLOv5.
For now, sticking with PyTorch v2.2.2 where you're not encountering this error sounds like a solid workaround. It's always good practice to test different versions of dependencies if you run into issues. If anything else comes up or if you have further questions, feel free to ask! Your observations make a valuable contribution to the community. 👍
Search before asking
Question
Hello everyone,
unfortunately I cannot select a specific CUDA device.
I run the following:
I get
But if I simply omit
-device 0
, then it works. However, all CUDA devices are selected.I have installed the following pip packages
Python 3.10.12
Additional
No response
The text was updated successfully, but these errors were encountered: