You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After 49 iterations, the model always stops training and runs into this error.
I am training without CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
Traceback (most recent call last):
File "pytorch_connectomics/scripts/main.py", line 67, in <module>
main()
File "pytorch_connectomics/scripts/main.py", line 62, in main
trainer.train()
File "/n/home00/nwendt/zebrafish/pytorch_connectomics/connectomics/engine/trainer.py", line 92, in train
GPUtil.showUtilization(all=True)
File "/n/home00/nwendt/anaconda3/envs/py3_torch/lib/python3.7/site-packages/GPUtil/GPUtil.py", line 210, in showUtilization
GPUs = getGPUs()
File "/n/home00/nwendt/anaconda3/envs/py3_torch/lib/python3.7/site-packages/GPUtil/GPUtil.py", line 102, in getGPUs
deviceIds = int(vals[i])
ValueError: invalid literal for int() with base 10: 'No devices were found'
The text was updated successfully, but these errors were encountered:
After 49 iterations, the model always stops training and runs into this error.
I am training without CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
The text was updated successfully, but these errors were encountered: