New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems during training #5
Comments
I guess the problem maybe is you remove "--multiprocessing_distributed" in argument. Can you run CUDA_VISIBLE_DEVICES=0 python bts_main.py arguments_train_nyu_debug.txt? |
Thank you for your reply,I run CUDA_VISIBLE_DEVICES=0 python bts_main.py arguments_train_nyu_debug.txt The same problem occurred -- Process 0 terminated with the following error: And I did not modify --multiprocessing_distributed |
This is my arguments_train_nyu_debug.txt, I just modified the data path. --mode train --log_freq 10 |
Would you mind share your environment? |
Of course, my GPU is RTX 3090 cuda 11.1 ,my environment is python=3.7.10 |
I recommend torch==1.5.1 cuda 10.1 or pytorch==1.7.1 cuda11.0. I never try torch>1.7 |
After I installed pytorch 1.7.1, the problem was solved, thank you very much for your answers! |
Hello, I only have one GPU, when I try to train with NYU dataset, enter the following command
CUDA_VISIBLE_DEVICES=0 python bts_main.py arguments_train_nyu.txt
Found the following problems
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/ace/PycharmProjects/TransDepth-main/pytorch/bts_main.py", line 439, in main_worker
var_sum = np.sum(var_sum)
File "<array_function internals>", line 6, in sum
File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2248, in sum
initial=initial, where=where)
File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/torch/tensor.py", line 621, in array
return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
I found some solutions but none of them worked,How can I solve this?I sincerely look forward to your reply
The text was updated successfully, but these errors were encountered: