How to change the GPU device? #45

whuhxb · 2022-04-20T08:25:25Z

Hi, how to change the GPU device in the code? The default is GPU:0.

yuxumin · 2022-04-20T08:27:30Z

Use the following command to train a model with GPU:1

bash ./scripts/train.sh 1 --config <config>

whuhxb · 2022-04-20T08:52:58Z

@yuxumin OK. Thanks. But when I change into GPU:1, I still met the OOM when running the command: bash ./scripts/train.sh 2 --config ./cfgs/PCN_models/GRNet.yaml --exp_name example

init.py", line 22, in forward
min_x, max_x, min_y, max_y, min_z, max_z, gt_cloud)
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 31.75 GiB total capacity; 29.93 GiB already allocated; 451.50 MiB free; 29.99 GiB reserved in total by PyTorch)

It seems that the code still run on GPU: 0 ? I'm not sure.

yuxumin · 2022-04-20T08:56:20Z

It seems that the code still run on GPU: 0 ? I'm not sure.

The code in train.sh is CUDA_VISIBLE_DEVICES=${GPUS} python main.py ${PY_ARGS}. I am sure that the code run on GPU:1

bash ./scripts/train.sh 2 --config ./cfgs/PCN_models/GRNet.yaml --exp_name example

And this will put the model and data on GPU:2

whuhxb · 2022-04-20T09:08:37Z

@yuxumin OK. I see. I try to run other GPU Cluster nodes, but I met an error like this:

anaconda3/envs/pytorch-PoinTr/lib/python3.7/site-packages/knn_cuda/init.py", line 15, in load_cpp_ext
assert torch.cuda.is_available(), "torch.cuda.is_available() is False."
AssertionError: torch.cuda.is_available() is False.

I use cuda11.1 and pytorch==1.8.0, it seems that the GPU is not available now.

whuhxb · 2022-04-20T09:19:24Z

@yuxumin Which file to change batchsize of GRNet.yaml and TopNet.yaml?

yuxumin · 2022-04-20T09:22:09Z

bash ./scripts/train.sh 2 --config ./cfgs/PCN_models/GRNet.yaml --exp_name example

This code means to use the config file in cfgs/PCN_models/GRNet.yaml

whuhxb · 2022-04-20T09:24:06Z

@yuxumin Is this: total_bs : 32 in cfgs/PCN_models/GRNet.yaml to change the batch size, right?

yuxumin · 2022-04-20T09:28:34Z

yes

whuhxb · 2022-04-20T09:30:03Z

@yuxumin I have change it into 2, the OOM still occurs.

yuxumin · 2022-04-20T09:33:35Z

GRNet requires to calculate gridding loss during the training, which takes a lot of memory of GPU. Can you try to train other models, rather than GRNet

yuxumin · 2022-05-05T10:05:10Z

Close it since no response. Feel free to re-open it if problems still exist

yuxumin closed this as completed May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to change the GPU device? #45

How to change the GPU device? #45

whuhxb commented Apr 20, 2022

yuxumin commented Apr 20, 2022

whuhxb commented Apr 20, 2022

yuxumin commented Apr 20, 2022 •

edited

Loading

whuhxb commented Apr 20, 2022

whuhxb commented Apr 20, 2022

yuxumin commented Apr 20, 2022

whuhxb commented Apr 20, 2022 •

edited

Loading

yuxumin commented Apr 20, 2022

whuhxb commented Apr 20, 2022

yuxumin commented Apr 20, 2022

yuxumin commented May 5, 2022

How to change the GPU device? #45

How to change the GPU device? #45

Comments

whuhxb commented Apr 20, 2022

yuxumin commented Apr 20, 2022

whuhxb commented Apr 20, 2022

yuxumin commented Apr 20, 2022 • edited Loading

whuhxb commented Apr 20, 2022

whuhxb commented Apr 20, 2022

yuxumin commented Apr 20, 2022

whuhxb commented Apr 20, 2022 • edited Loading

yuxumin commented Apr 20, 2022

whuhxb commented Apr 20, 2022

yuxumin commented Apr 20, 2022

yuxumin commented May 5, 2022

yuxumin commented Apr 20, 2022 •

edited

Loading

whuhxb commented Apr 20, 2022 •

edited

Loading