Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to change the GPU device? #45

Closed
whuhxb opened this issue Apr 20, 2022 · 11 comments
Closed

How to change the GPU device? #45

whuhxb opened this issue Apr 20, 2022 · 11 comments

Comments

@whuhxb
Copy link

whuhxb commented Apr 20, 2022

Hi, how to change the GPU device in the code? The default is GPU:0.

@yuxumin
Copy link
Owner

yuxumin commented Apr 20, 2022

Use the following command to train a model with GPU:1

bash ./scripts/train.sh 1 --config <config> 

@whuhxb
Copy link
Author

whuhxb commented Apr 20, 2022

@yuxumin OK. Thanks. But when I change into GPU:1, I still met the OOM when running the command: bash ./scripts/train.sh 2 --config ./cfgs/PCN_models/GRNet.yaml --exp_name example

init.py", line 22, in forward
min_x, max_x, min_y, max_y, min_z, max_z, gt_cloud)
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 31.75 GiB total capacity; 29.93 GiB already allocated; 451.50 MiB free; 29.99 GiB reserved in total by PyTorch)

It seems that the code still run on GPU: 0 ? I'm not sure.

@yuxumin
Copy link
Owner

yuxumin commented Apr 20, 2022

It seems that the code still run on GPU: 0 ? I'm not sure.

The code in train.sh is CUDA_VISIBLE_DEVICES=${GPUS} python main.py ${PY_ARGS}. I am sure that the code run on GPU:1

bash ./scripts/train.sh 2 --config ./cfgs/PCN_models/GRNet.yaml --exp_name example

And this will put the model and data on GPU:2

@whuhxb
Copy link
Author

whuhxb commented Apr 20, 2022

@yuxumin OK. I see. I try to run other GPU Cluster nodes, but I met an error like this:

anaconda3/envs/pytorch-PoinTr/lib/python3.7/site-packages/knn_cuda/init.py", line 15, in load_cpp_ext
assert torch.cuda.is_available(), "torch.cuda.is_available() is False."
AssertionError: torch.cuda.is_available() is False.

I use cuda11.1 and pytorch==1.8.0, it seems that the GPU is not available now.

@whuhxb
Copy link
Author

whuhxb commented Apr 20, 2022

@yuxumin Which file to change batchsize of GRNet.yaml and TopNet.yaml?

@yuxumin
Copy link
Owner

yuxumin commented Apr 20, 2022

bash ./scripts/train.sh 2 --config ./cfgs/PCN_models/GRNet.yaml --exp_name example

This code means to use the config file in cfgs/PCN_models/GRNet.yaml

@whuhxb
Copy link
Author

whuhxb commented Apr 20, 2022

@yuxumin Is this: total_bs : 32 in cfgs/PCN_models/GRNet.yaml to change the batch size, right?

@yuxumin
Copy link
Owner

yuxumin commented Apr 20, 2022

yes

@whuhxb
Copy link
Author

whuhxb commented Apr 20, 2022

@yuxumin I have change it into 2, the OOM still occurs.

@yuxumin
Copy link
Owner

yuxumin commented Apr 20, 2022

GRNet requires to calculate gridding loss during the training, which takes a lot of memory of GPU. Can you try to train other models, rather than GRNet

@yuxumin
Copy link
Owner

yuxumin commented May 5, 2022

Close it since no response. Feel free to re-open it if problems still exist

@yuxumin yuxumin closed this as completed May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants