Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDP nl fix #5332

Merged
merged 1 commit into from Oct 25, 2021
Merged

DDP nl fix #5332

merged 1 commit into from Oct 25, 2021

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented Oct 25, 2021

Fix for #5160 (comment)

πŸ› οΈ PR Summary

Made with ❀️ by Ultralytics Actions

🌟 Summary

Improved multi-GPU training support by ensuring model parameter scaling accounts for the wrapped model.

πŸ“Š Key Changes

  • Modified the retrieval of the nl (number of detection layers) to use de_parallel function when the model is in Distributed Data Parallel (DDP) mode.

🎯 Purpose & Impact

  • Purpose: The change ensures that when a model is being used across multiple GPUs, the detection layers count is correctly retrieved even when the model is wrapped for parallel processing.
  • Impact: This improvement could lead to more accurate scaling of hyperparameters (hyp) during multi-GPU training, which enhances model performance and training stability. Users employing DDP will benefit from accurate hyperparameters adjustments irrespective of the number of GPUs used. πŸš€

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docker Multi-GPU DDP training hang on destroy_process_group() with wandb option 3
1 participant