Open
Description
Hello,
When I train the model using multiple GPUs, I find there has multiple processes on the main thread (rank 0).
I haven't found out what's causing it at the moment.
Run command as below:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py
My code main.py
as below:
from rfdetr import RFDETRLarge
model = RFDETRLarge()
model.train(dataset_dir="mydataset/", epochs=10, batch_size=4, grad_accum_steps=4, lr=1e-4, output_dir="exp_train/")
nvidia-smi shows below:
Metadata
Metadata
Assignees
Labels
No labels