-
-
Notifications
You must be signed in to change notification settings - Fork 16.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multigpu and multinode performance #12424
Comments
@unrue hi there! Thanks for reaching out. This is a known behavior due to the overhead of synchronizing across multiple GPUs and inter-node communication. YOLOv5 and its multi-GPU capability are actively optimized, with scaling improvements ongoing. For real-time updates, please see the training best practices on our documentation. If you have further questions or feedback, feel free to let us know. 🚀 |
Thanks Glenn, so, in fact at the moment MultiGPU-multinode on Yolo is not useful. In the above link, I dont' see any tip to improve multigpu performances. I'm using Yolo on HPC cluster, having a lot of GPUs available. But, if Yolo does not scale up, I'm limited to run into a single node :/ I'll follow future updates. Thanks. |
@unrue you're welcome! I appreciate your understanding. Our team is actively working to enhance multi-GPU and multi-node performance, and we value your feedback in this process. Your support and patience mean a lot. If you have any more questions or run into any issues, feel free to ask. We're here to help! |
Thanks Glenn, apart time performance, are there other reason to enable Multinode in Yolo? More data processing? |
@unrue Absolutely, multinode setups can certainly enable larger-scale data processing and model training when dealing with massive datasets and resource-intensive tasks. This can be especially beneficial for distributed data parallel training or for handling extremely large models. Keep an eye on our updates for improvements and new features in this area. If you have any more questions, feel free to ask. Good luck with your work! 🌟 |
Thansk Glenn, yes I have another question. Suppose Yolo starts with 4 GPUs and 50 epochs. Second test, Yolo run with 8 GPus, in such case, the number of epochs should be 25? Or the epochs remain the same? I mean, the number of epochs should be resized when the number of gpus grows up? Or it remains constant? Thanks. |
@unrue The number of epochs should remain constant regardless of the number of GPUs used. You do not need to resize the number of epochs when scaling up the number of GPUs. However, when increasing the number of GPUs, you may observe faster convergence due to increased parallelism, potentially reducing training time. If you have any more questions or need further clarification, feel free to ask. Happy to help! |
Do you already have an idea why Yolo does not scale? Where is the bottleneck. |
@unrue The main bottleneck in scaling YOLOv5 across multiple GPUs and nodes is the communication and synchronization overhead between the GPUs. Our team is actively working to optimize and improve the scalability of YOLOv5, so keep an eye out for updates as we continue to address these challenges. Your feedback is invaluable as we work to enhance the multi-GPU and multi-node performance. If you have further questions or need assistance, feel free to ask. Thank you for your understanding and support! |
👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help. For additional resources and information, please see the links below:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed! Thank you for your contributions to YOLO 🚀 and Vision AI ⭐ |
Search before asking
Question
I'm training Yolov5 on custom dataset and HPC machine, having 4 GPU per node. I'm doing some performance test, using different number of GPUs. Each test run for 100 epochs. The following are time results:
More or less the AP is the same. I'm bit confused. Why Yolov5 training time does not scale with more GPUs? Each epoch should be finish in less time using more Gpus as well as the execution time. Right? Someone could explain such behaviour? Thanks.
Additional
No response
The text was updated successfully, but these errors were encountered: