Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command/file generator for DDP #79

Merged
merged 5 commits into from
Dec 17, 2022
Merged

Command/file generator for DDP #79

merged 5 commits into from
Dec 17, 2022

Conversation

AyushExel
Copy link
Contributor

@AyushExel AyushExel commented Dec 16, 2022

@Laughing-q I attempted to create a workflow where the generate_ddp_command checks if the command is coming from cli. Then it creates a temp file with the training command inside if __name__ == "__main__": block and uses that file to generate the DDP command.

Note:

  • the temp file is currently created in the current working dir.. We can change it to ultralytics/ once we confirm that it works
  • the temp file probably needs to be deleted manually as it persists. Again, we can do it once DDP works.

I have checked until the command/file generation and the syntax in the file seems correct. Can you check if it works as supposed to with DDP also?
I have made this PR to your branch instead of committing there directly as I'm not sure if it'll work. Merge this to your branch if this works.

πŸ› οΈ PR Summary

Made with ❀️ by Ultralytics Actions

🌟 Summary

Enhancement of multi-GPU training setup in YOLO training engine.

πŸ“Š Key Changes

  • Updated the train method in trainer.py to pass self to generate_ddp_command.
  • Created a new utility generate_ddp_file in dist.py that generates a temporary Python file for distributed training configuration.
  • Modified generate_ddp_command in dist.py to handle CLI usage and utilize the new generate_ddp_file when necessary.

🎯 Purpose & Impact

  • πŸš€ The changes aim to streamline the setup process for distributed training across multiple GPUs.
  • πŸ›  By creating a temporary file with training configuration, it simplifies the command generation, particularly when using command line interfaces.
  • 🀝 This update is expected to make it easier to initiate distributed training, potentially leading to more efficient and user-friendly multi-GPU training experiences for YOLO models.

@Laughing-q
Copy link
Member

@AyushExel okay tested it and it works. Should I just merge it to DDP?

@AyushExel
Copy link
Contributor Author

@Laughing-q yeah sure..do you see the merge option ?

@Laughing-q Laughing-q merged commit fc04821 into DDP Dec 17, 2022
@Laughing-q Laughing-q deleted the DDP_2 branch December 17, 2022 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants