How to use callbacks with DistributedDataParallel training? #8325

haimat · 2022-06-24T09:13:19Z

haimat
Jun 24, 2022

We use the callback system in the YOLOv5 training function train() from train.py file in order to integrate it into our custom training frontend. This works fine, even when using Multi-GPU DataParallel Mode, running in one process. Now we would like to train in Multi-GPU DistributedDataParallel Mode, however, as far as I understand this would spawn multiple processes, right? So if this is the case, how can one train in DistributedDataParallel mode but still make use of the callbacks in the training script in order to get current status updates during training?

glenn-jocher · 2022-06-24T11:38:33Z

glenn-jocher
Jun 24, 2022
Maintainer

@haimat we don't assist with code customizations but you might visit Multi-GPU Training tutorial for DDP examples commands.

YOLOv5 Tutorials

Train Custom Data 🚀 RECOMMENDED
Tips for Best Training Results ☘️ RECOMMENDED
Weights & Biases Logging 🌟 NEW
Roboflow for Datasets, Labeling, and Active Learning 🌟 NEW
Multi-GPU Training
PyTorch Hub ⭐ NEW
TFLite, ONNX, CoreML, TensorRT Export 🚀
Test-Time Augmentation (TTA)
Model Ensembling
Model Pruning/Sparsity
Hyperparameter Evolution
Transfer Learning with Frozen Layers ⭐ NEW
Architecture Summary ⭐ NEW

Good luck 🍀 and let us know if you have any other questions!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use callbacks with DistributedDataParallel training? #8325

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to use callbacks with DistributedDataParallel training? #8325

haimat Jun 24, 2022

Replies: 1 comment

glenn-jocher Jun 24, 2022 Maintainer

YOLOv5 Tutorials

haimat
Jun 24, 2022

glenn-jocher
Jun 24, 2022
Maintainer