Imbalance in Multi-GPU training

Hello, 

When I train the model using multiple GPUs, I find there has multiple processes on the main thread (rank 0). 

I haven't found out what's causing it at the moment. 

Run command as below:

```Shell
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4  --use_env main.py
```

My code `main.py` as below:

```Python
from rfdetr import RFDETRLarge

model = RFDETRLarge()
model.train(dataset_dir="mydataset/", epochs=10, batch_size=4, grad_accum_steps=4, lr=1e-4, output_dir="exp_train/")
```

nvidia-smi shows below:

![Image](https://github.com/user-attachments/assets/f3aa1a1c-659b-4234-9d6b-2633b05d52b1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Imbalance in Multi-GPU training #242

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Imbalance in Multi-GPU training #242

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions